Article

Data Science

A guide for artificial intelligence and machine learning

Published

September 2020

Download the article

What is data science?

The big data revolution is now at least a decade old, and every company and public institution has more or less fully developed data warehouses. But what are they being used for? What value are they creating? We have gotten better and more detailed reporting on the daily operations, more management reporting and more dashboards. All of this is good and valuable, but we can do so much more. This is what data science is all about.

Data science promises to bring operational solutions and even deeper insights into organisations. As such, the promise of data science is closer to the original promise of mechanical automation. Data science produces solutions. Solutions that go into productions and directly affect an operational part of the organisation, supporting, improving or maybe automating a work process. Sometimes even enabling new work processes. This is not the only feature of data science, but it is probably the core promise, and why it is being compared to the fourth industrial revolution (Forum 2016).

Data science is an interdisciplinary field collecting classical statistics, data analysis and machine learning methods in an attempt to understand and analyse real-world phenomena via data

An example could be a doctor. A patient comes through the door, and a data science solution not only presents a dashboard based on the patient’s journal but also a set of predicted points of attention for the doctor based on the patient’s history, the latest research and the current disease landscape in the world.

The patient gets a scan of his lungs which are causing trouble, and as the scanner delivers the image, it immediately suggests that a region of the image be examined for potential cancerous growth and immediately highlights the area. The machine is trained on millions of images worldwide, and the doctor is immediately more alert.

After the patient has left, the doctor dictates a note which is automatically saved as text. An algorithm detects that a scan of the lungs is mentioned and automatically adds a procedure code to the patient’s medical history. It is also noted that cancer is suspected, and a flag of attention is raised on the patient for further attention.

In the example above, data science was first used to give the user an overview based on an otherwise incomprehensibly large information base. Subsequently, decision support was provided as the area was highlighted. Finally, there was a complete automation of the medical record keeping of the work.

We are hardly there yet. But I have no doubt that we are on the way. This guide will give you an idea of how data science works in organisations today, and how you can get started in your own organisation.

Who is this guide for?

This guide is first and foremost for people who want to increase their organisation’s data science skills. Either by becoming better data scientist or software developers themselves or by leading a data science department, small or large. This guide provides deep insight into a complete data science workflow – not just from data to model but from identifying the right problem to setting up the model to realising value and ensuring maintenance.

Why this guide?

Though many organisations want to get started with artificial intelligence and data science, and some even try, they systematically encounter some challenges. These are the challenges we would like to address in this guide.

Inspiration

In Implement, we experience a great demand for data science competencies. This is driven by the rapid growth of data in most modern organisations, which naturally raises the question: How can we leverage all this data?

The next thing that usually happens is that an employee is set to work. Either to identify a use case or is given a concrete one by management.

The challenge is that a skilled data scientist has three skills which is an extremely rare combination:

  1. A strong understanding of software development
  2. A strong understanding of statistics and machine learning
  3. A strong understanding of the business or IT domain they work in.

Finding the right employee (or candidate) with this set of skills can be very tricky, and developing a department with it introduces its own challenges.

If one of these elements is missing, we see some common challenges arise:

  • Lacking software developer skills, the data scientist risks becoming an expert who cannot deliver solutions that are actually implemented and bring about change. The solutions remain too technical or are often just never finished.
  • Lacking statistics and machine learning skills, the data scientist will often find it difficult to provide the adequate quality of solutions in demand. They will probably finish and solve the problem, but they are prevented from tackling the new and difficult problems which they potentially could if they had a greater understanding of machine learning.
  • Lacking business knowledge, they risk developing great solutions that are not applicable in business and where the benefits of development are never realised.

Not surprisingly, finishing projects, achieving the expected quality and realising the desired impact are some of the most widespread problems in new data science projects.

Content of this guide

In this guide, you will be taken on a journey from the earliest start of a data science project with use case identification to advising on governance of models. The guide consists of seven chapters with the following headings:

  • Chapter 1. Use case identification: How do you ensure that you are working on a valuable idea?
  • Chapter 2. Data exploration: Get an overview of your data from patterns and content to creation and management.
  • Chapter 3. Definition of purpose: What should your model optimise for? What exactly is the problem we are trying to solve? And how do you make sure that your model actually does what you expect?
  • Chapter 4. Machine learning: What techniques do we use to solve the problems? How do you hit the right amount of complexity and ensure that your model works in real life?
  • Chapter 5. Ethics: Data science touches on many human aspects from personal data to automation. Chapter 5 gives an introduction to the main considerations as well as some concrete advice on how to proceed in a human-centric way.
  • Chapter 6. Data science governance: When models are designed to create value, they must be managed and maintained. They may even need to be updated or at least not burden the IT landscape around them. That is why we have governance.
  • Chapter 7. Governance tools: When we implement governance around models, we look for some helpful tools. Chapter 7 provides an insight into the five most important tools from version control to model monitoring.