Article

Scalable data science development

Published

February 2022

Authors

Artificial intelligence has the potential of changing both businesses and lives. We are already surrounded by AI systems (the output of data science) whether it is when we get product recommendations during online shopping or when our navigation can guess our destination before we have entered a single character in the search field.

AI, machine learning and many other synonyms have been subject to hype for almost a decade now, and at Implement Consulting Group, we are seeing companies now moving into the next phase of their AI and machine learning adoptions. The first movers and early adopters already have AI systems as an integrated part of their business processes. However, most companies are struggling to take the difficult leap from proof of concept to having reliable AI algorithms as part of their production IT systems. For many companies, this is a true make-or-break situation – particularly considering how the ability to learn from and adapt to data will most definitely be a key competitive parameter across most – it not all – industries in the next decade.

Many companies have done AI PoCs that have ended up on the famous PoC graveyard when realising that running AI in production requires one to bring a different skill set and new methods compared to what was used in the first place to deliver a successful PoC.

This is the second article in our series on successfully adopting AI in organisations. Each of the articles cover our take on what it takes to be successful to run AI systems in production, namely:

  1. A use case-driven approach to AI adoption
  2. Building scalable AI solutions
  3. Technology selection for AI implementations
  4. Capabilities necessary to leverage the full potential of AI solutions

Demands for making data science and AI projects scalable are increasing

One of the major reasons for companies to get stuck at the proof-of-concept (PoC) stage when developing data science solutions is the lack of people, processes and technologies to deliver AI models into a production environment. The practice of doing so is often called MLOps (short for Machine Learning Operations).

MLOps is a relatively new phenomenon, and there are still many definitions to what MLOps actually means. At Implement Consulting Group, we think about MLOps as the entire lifecycle from data to model and the people, processes and technologies that support the different stages of development, deployment and operations for the model but also for the data behind the model.

The level of professionalism around data science has dramatically changed over the last couple of years. Not long ago, it was acceptable to have models running on local desktop PCs – sometimes running under the specific data science developers’ credentials. As the value and risk of AI models have grown, so have the requirements from internal parties such as business owners and executives but also from external parties such as shareholders and authorities to handle AI in a more professional manner. One example of internal requirements could be demand for handling 24/7 operations on model execution. Another example of external requirements is the EU Draft AI Regulations from April 2021, where the corporate fines for being non-compliant can be up to 6% of total worldwide annual turnover for the preceding financial year.

When companies start to professionalise data science workflows through MLOps, the benefits will quickly become obvious across compliance of systems, developer satisfaction and time to market for new AI models. Furthermore, this strengthens the overall quality of the models, which are being pushed to front-line operations, hence increasing trust in results while minimising financial and legal risks. We believe that mastering MLOps is already part of the “license to operate” in the same way as DevOps has become within software development the last couple of years.

Professionalising data science model development requires a focused effort on both data and model management

When increasing the maturity level of data science models from PoC to production IT level, one must consider a range of aspects. In the figure below, we have structured the necessary elements in two categories: data management and model management. These are the two fundamental things you will need to have in place for the model to run: a stable environment to develop and execute the model and a stable feed of trustworthy data into this environment. All of this must be supported by the right people with the right skills while security will need to be thought into it end to end. In the next section, we will dive into each of the two areas.

Data management

Data management is a field of study in itself, and we will therefore only scratch the surface in this article. To summarise data management in a single phrase, it includes the practices of ensuring that you have well-described data sets available for the right people in the organisation at the right time and of the right quality.

We see the following four aspects as the key components of data management in a data science context.

Manage data model

Your data model is a way of clearly communicating relations within your data. It also serves the purpose of separating the data from living within specific source systems and helping you create a single source for key concepts across your organisation.

Manage data ownership and responsibility

Data without an owner will often become stale and unusable. This is particularly important if delivering high-risk/high-value data science projects. In that case, you will need to know exactly where to go in case of quality or delivery issues.

Manage data lifecycle

Some data sets will have to be deleted, pseudonymised or anonymised over time. Other data sets have no value after a certain amount of time and should be deleted in order not to impact platform performance and cost.

Manage data quality

A model is never better than the data it is built on. Therefore, the old data management practice of continuously ensuring that data quality is satisfactory becomes even more important than before when front-line operations are making day-to-day decisions based on the outputs of models.

Model management

Model management includes the entire lifecycle of developing, deploying and operating machine learning or AI models. As such, model management includes a range of technical competencies that go beyond what the majority of data scientists can cater for today. Most companies have already discovered that hiring a couple of data scientists will not by default make their business AI-driven – this discovery is a result of how important model management is.

In our opinion, model management consists of the following four key components:

Manage model workbench

A good development environment is key to increase developer speed. A model workbench is referring to the developer environment which is used by data scientists and developers. Several platforms are available today with a wide range of components within model management, e.g. Azure Machine Learning, Amazon SageMaker, Google Vertex AI etc. These platforms should be evaluated and selected with regards to individual company needs, ambitions and current infrastructure.

Manage model development

Model development includes experiment tracking and visualisation, model lifecycle and review as well as deployment. Most often, data scientists are experimenting with data sets and different methodologies before selecting a specific type of model which provides the best trade-off between prediction performance and explainability (and potentially other measures like fairness). However, experiments should be registered with versioning and direct reference to the training data set, ensuring that the results are always reproducible. When moving towards a professional MLOps structure, local experiments without structured tracking belongs to the past. This ensures that the lifecycle of machine learning models is documented from data to code and results. Based on the documented experiments, experts and the business can review and approve models given specific quality metrics.

Manage model ethics

The realm of AI and machine learning is subject to a lot of debate around important ethical questions, which are being addressed by both academia, governments and industry. While the field of AI ethics is not an exact science, there are tangible areas in which companies and data scientists can commit to developing responsible and trustworthy AI systems. Debates such as manipulation of behaviour, explainability and bias/fairness are all relatable and addressable while developing AI systems and should be accounted for in commercial as well as non-commercial solutions.

Manage model execution

Using machine learning to create value on a continuous basis involves an integrated environment where models can serve prediction results to a business application. Performance monitoring and scalability become vital to ensure stable workflows while the ability to compare and reproduce results is essential for future development. Performance monitoring includes health of the model deployment and data drift detection. Data drift refers to the concept where certain data distributions might change over time, which can lead to an outdated model. The business application receiving the model results determines the necessity of scalability for the model execution, which must be implemented with ability to serve the requirements ensuring business value creation.

Bring it all together

By having a structured way of working with the elements under the two categories data management and model management from above, you will be much closer to the goal of having a stable environment to develop and execute the model and a stable feed of trustworthy data into this environment. All of this must be supported by the right people with the right skills while remembering to bring security into every process.

Conclusion

This article has given a short introduction to how AI systems can be developed and deployed in a scalable way using an MLOps framework. We have described a structured process for ensuring that AI development happens in a way to ensure stability and scalability – just as is expected from any other production IT system.