Data Science in the era of artificial intelligence
24 February 2023
What is data science?
The big data revolution is now at least a decade old, and most companies and public institutions have fully developed data warehouses or lakes where organisational data is collected. But what is it being used for? What value does it create? We have gotten better and more detailed reporting on the daily operations, more management reporting and more dashboards. All of this is good and valuable, but we can do so much more. This is what data science is all about.
Data science promises to bring operational solutions and even deeper insights into organisations. As such, the promise of data science is closer to the original promise of mechanical automation. Data science produces solutions. Solutions that go into productions and directly affect an operational part of the organisation, supporting, improving or maybe automating a work process. Sometimes even enabling new work processes. This is not the only feature of data science, but it is probably the core promise and why it is being compared to the fourth industrial revolution (Forum, 2016).
An example could be a doctor. A patient comes through the door, and a data science solution not only presents a dashboard based on the patient’s journal but also a set of predicted points of attention for the doctor based on the patient’s history, the latest research and the current disease landscape in the world. The patient gets a scan of his/her lungs causing trouble, and as the scanner delivers the image, it instantly suggests that a region of the image be examined for potential cancerous growth and immediately highlights the area. The machine is trained on millions of images worldwide, and the doctor is more alert. After the patient has left, the doctor dictates a note which is automatically saved as text. An algorithm notes that a scan of the lungs is mentioned and automatically adds a procedure code to the patient’s history. It is also noted that cancer is suspected, and a flag of attention is raised on the patient for future reference.
In the example above, data science was first used to give the users an overview based on an otherwise incomprehensibly large information base. Subsequently, decision support was provided as the area was highlighted. Finally, there was a complete automation of the logging and archiving of the work. In this case, the doctor was able to work more precisely, spend more time with the patient and see more patients in a day thanks to intelligent use of data.
We are hardly there yet. But I am in no doubt that we are on the way. This guide will give you an idea of how data science works in organisations today, and how you can get started in your own organisation.
Who is the guide for?
This guide is first and foremost for people who want to increase their organisation’s data science skills. Either by becoming better data scientists or software developers themselves or by leading a data science department, small or large. This guide provides deep insight into a complete data science workflow – not just from data to model but from identifying the right problem to setting up the model to realising value and maintenance.
As such, it is also relevant for managers working with data and digitalisation. Data science is a rapidly developing field, and it can be a full-time job just to keep up with what is happening. We will provide an overview of what can be expected from modern data science solutions, and what it takes to achieve success with them.
Why this guide?
At Implement, we experience a great demand for data science competencies. This is driven by the rapid growth of data in most modern organisations which naturally raises the question: how can we leverage all this data?
The next thing that usually happens is that either it is decided that a business problem needs to be solved with data science or perhaps an enterprising employee proposes a concrete data science solution. In some cases, a team is formed, and solution identification begins.
In any case, the core challenge is the need to have three competencies present in the process: (1) software development and IT, (2) statistics and machine learning and (3) the business or domain within which they work.
If one of these elements are missing, we see some common challenges arise:
- Lacking the software developer skills, the data science project risks becoming a fantastic solution but which cannot be implemented in the business and is somewhere between difficult and impossible to maintain. The solution may emerge, and it may even create some value, but it will have a short life if it ever gets into operations
- Lacking the statistics and machine learning skills, the data science project will often find it difficult to provide the adequate quality of solutions that is in demand. The project will be prevented from tackling the new and difficult problems that they potentially could if they had a greater understanding of machine learning
- Lacking business knowledge, the project risks developing great solutions that are not applicable in business and where the benefits of development are never realised.
Unsurprisingly, finishing projects, achieving the expected quality and realising the desired impact are some of the most widespread problems in new data science projects.
Content of this guide
In this guide, you will be taken on a journey from the earliest start of a data science project with use case identification and all the way to advice on governance of models. The guide consists of eight chapters with the following headings:
- Chapter 1: Use case identification: How do you ensure that you are working on a valuable idea?
- Chapter 2: Data exploration: Get an overview of your data from patterns and content to creation and management.
- Chapter 3: Machine learning: What are the techniques we use to solve the problems? How do you hit the right amount of complexity and ensure that your model works in real life.
- Chapter 4: Definition of purpose: What should your model optimise for? What exactly is the problem we are trying to solve? And how do you make sure that your model actually does what you expect.
- Chapter 5: Modern deep learning and fine tuning: The toolbox of data science is rapidly growing and reuse of models across domains is changing things up greatly.
- Chapter 6: MLOps and testing: We look at how to deploy our models and properly update and maintain them.
- Chapter 7: Ethics: Data science touches on many human aspects from personal data to automation. Chapter 7 gives an introduction to the main considerations and some concrete advice on how to proceed in a human-centric way.
- Chapter 8: Sustainable data science: Investigates the environmental impact of our modelling work, and what we can do to minimise our footprint.
The guide can be read chapter by chapter, but it is also written such that each individual chapter can be read in isolation. If you just need a brush-up on a particular concept, feel free to skip to the relevant chapter.
Download the entire article and dive deeper into the state of data science 2023.