How to Effectively Deliver Machine Learning Projects 

Sharing tips and best practices

As per GlobeNewswire, around 65% of companies using or planning to use ML in their technology solutions believe it will make for smarter, better-informed business decision-making. On top of that, 74% of businesses believe it to be a game-changer, pointing to its potential to transform both the job and industry they work in.

Although 58% of organizations reportedly run models in production, there’re challenges at every stage of the ML development life cycle that can significantly slow down time to market and negatively affect the solution’s quality.

This article explores some of these challenges and shares what it takes to successfully deliver an ML project.

Contents

Machine Learning project life cycle

ML projects are highly iterative. As you progress through the ML life cycle, you will find yourself repeatedly iterating until you reach a satisfactory level of the model performance and then move on to the next task (which may return to an even earlier step). Moreover, the project is not considered to be completed after you have released the first version; you get feedback from real-world user interactions and redefine goals for the next deployment iteration.

ML has two distinct phases of operation:

  • Training,
  • Inference

 

During the training phase, ML algorithms learn patterns and trends from the data and build a model. The inference phase uses a model built to predict/generalize new, unseen data (from the real world). Essentially, the learning phase occurs during “experiments” while the inference phase occurs in the “real world.”

A standardized approach to ML adoption, consisting of scoping, understanding, creation, deployment, management, and trust, supports a collaborative process and helps:

  • Accelerate the process, 
  • Reduce/eliminate overhead, 
  • Minimize risks, etc.

 

But first things first. Before delving into the details of successful ML project delivery, let’s look at a typical AI product development life cycle.

A typical ML project development life cycle consists of 5 stages, which are usually defined as follows:

Scope

To prevent misunderstandings and find the right solutions for target user problems, stakeholders and product owners must reduce ambiguity. This is achieved by scoping use case queries and prioritizing ML product functionality against user pain points. It is crucial to have consistent communication and transparency between business stakeholders and a solution development team and detailed definitions of success in business and technology outcomes.

Aligning key performance indicators (KPIs) and expected solution quality criteria is key to project success and performance evaluation. The scope requires business problem understanding, and it also requires stakeholders and the technical team to understand the potential impact. This phase is a collaborative process that improves understanding of end-user pain points and helps develop empathy to fully understand user needs.

With a common language between stakeholders and an ML development team, pain points and solution options are prioritized and put into an action plan for specific ML tasks to address and specify the implementation environment. 

The scoping phase defines the project goals and vision that align with your or your customer’s business goals. At this stage, implicit risks and assumptions should be identified and documented. This step helps develop a risk assessment strategy and define a minimum viable product (MVP).

Build

The development phase consists of “training” and “inference”, explores data sources, metadata, provenance, relationships to other datasets, data quality metrics, and establishes appropriate rules and policies to govern your ML initiative. 

ML is not monolithic – the value it creates comes from its different capabilities. Within this phase, a series of iterations is focused on specific features needed to build an ML model: data exploration, data labeling, preparation, feature development, model building, behavior testing, as well as prediction, and idea discovery.

This collaborative process uses various tools, methods, and platforms, which can be open source or proprietary. The build phase is best done as a set of Agile sprints. Each sprint has a specific ROI goal and expected outcomes. The number and duration of a set of sprints are determined by the project manager. Each sprint should create a prototype of the final product.

Deploy 

This phase unfolds full-blown AI capability where real-world data is used to solve real problems. The step is simply the development of real-life models to run proof-of-concept (POC) into production. It has a higher level of complexity, which depends on the company’s structure, the size of a solution development team, internal work processes, ethics, infrastructure requirements, data management, knowledge, budgets, and other conditions.

This step defines strategies for deploying data to specific systems, as ML models can be deployed to multiple systems and interfaces. The platforms on which ML models should be used are unique to the specific deployment and depend on the use case. MLOps (Machine Learning Operations) ensures a structured approach to moving from “build” to “deploy” and establishes a pipeline to move resources from the development to the production environment.

Moving resources from development to production is a series of steps that lead to the continuous integration/continuous deployment (CI/CD) paradigm. 

This CI/CD provides structure and a streamlined workflow for the entire AI lifecycle. As a result, the end goal allows for iterative improvements and can be easily tested for success.

Manage

ML projects need to be supported through effective resource management and performance optimization.

As Jason Tamara said, “Throwing more resources into solving a problem will not lead to the right solution; it’s more likely to lead you to wrong answers faster.” The performance of the deployed model degrades over time as new patterns and new intents are exposed to real-world environments.

This degradation occurs because the learning phase goes beyond these new changes in real life. This step monitors the model’s performance metrics and evaluates those metrics against a certain threshold. The evaluation process can trigger automation requests to retrain the model or alert the development team. Aligning various performance metrics and business KPIs improves the common language of stakeholders and increases the initiative’s credibility.

It is essential to evaluate models for explainability, fairness, and bias. Responsible AI is a fast-growing area within the AI ecosystem that explores and provides frameworks that increase trust in AI deployments. Monitoring fairness and removing bias are essential requirements identified in various guidelines.

The explainability of the model’s results promotes trust because it understands how decisions were made, as opposed to a black box paradigm. Explainability increases the transparency of the ML process, quantifying model features provides access to root cause analysis.

What it takes to build a Machine Learning solution

Assembling a Machine Learning Team

An ML team may be different from a typical software development team setup. As organizations work to successfully create artificial intelligence, they need to consider and understand whom to involve in this process. The final skills to be sought-after include leadership, analytics, and design, data and data management, visualization, etc.

Modern machine learning teams are really diverse. However, in essence, they include specialists with strong analytical skills, the ability to understand data from various domains, train and deploy predictive models, and generate business or product insights.

Typical profiles within an ML Team

Skills

  • Subject matter expertise in a relevant domain
  • Product design, marketing 
  • Data analytics 
  • Program management
  • Understanding of project management, product roadmaps and end to end project delivery
  • Understanding of software, architecture, data, and ML best practices
  • Basic knowledge of fundamental ML concepts, processes, metrics, and deployment

 

Responsibilities

  • Create detailed product roadmaps with milestones, deliverables, metrics, and business impact.
  • Conduct customer surveys to optimize UX and reduce friction.
  • Balance multiple stakeholder and customer priorities to define and deliver a product
  • Work with software and machine learning teams to iterate and improve models according to the roadmap.
  • Take ownership of the product and ensure that features and the entire product are delivered on time.

 

Tech stack

  • Excel
  • SQL
  • Work management tools
  • Productivity tools 
  • Scheduling tools

Skills

  • Programming
  • Statistics
  • Data Analytics
  • Data Visualization
  • Data Science
    • Supervised machine learning
    • Unsupervised machine learning

Responsibilities

  • Identify and validate use cases that can be addressed with ML.
  • Analyze and visualize data throughout the modeling pipeline.
  • Develop custom algorithms and data processing models.
  • Define additional datasets or create synthetic data.
  • Develop data annotation strategies and validate them.
  • Develop proprietary tools or libraries to streamline your entire data modeling workflow.

Tech stack

  • Python
  • Java
  • R
  • Jupyter notebooks
  • SQL
  • Git, Github/Bitbucket
  • Spark
  • Visualization: Matplotlib, Seaborn, Plotly, etc.
  • Cloud: AWS/Azure/GCP, SageMaker, Boto, S3
  • ML: Fast.ai, Scikit-learn, OpenCV, AllenNLP
  • Deep learning: PyTorch, TensorFlow, MXNet, JAX, Chainer
  • Hyperparameter tuning: Neptune, Comet, Weights & Biases

Skills

  • Database
  • Programming
  • Querying languages
  • Data Pipelines
  • Architecture
  • Analytics
  • Data manipulation, transformation and preprocessing
  • Cloud services
  • Workflow management

Responsibilities

  • Create data pipelines, architectures, and infrastructure
  • Cleansing and processing datasets for data modeling
  • Build internal tools to streamline your data workflow
  • Aggregate disparate datasets for specific use cases
  • Support data scientists with data-related requirements

Tech stack

  • Java
  • Python
  • SQL, MySQL
  • C++
  • Scala
  • Hadoop
  • Kafka
  • Spark
  • DB: Postgres, Cassandra, MongoDB, Storm, Redis, Hive 
  • Cloud: AWS/Azure/GCP, Redshift, EC2, EMR, RDS

Skills

  • Data structures and modeling
  • Programming
  • Statistics
  • ML frameworks: TensorFlow, PyTorch, Scikit-learn, etc.

Responsibilities

  • Deploy models to production
  • Create A/B testing candidate models
  • Optimize models to improve latency and throughput
  • Inference testing on various hardware: Edge, CPU, GPU
  • Model performance monitoring, maintenance, debugging
  • Maintain versions of models, experiments, and metadata
  • Understand use cases and interact with data scientists and other project stakeholders.

Tech stack

  • Linux
  • Python
  • Cloud: AWS/Azure/GCP; S3, SageMaker, Boto, EC2
  • ML: Scikit-learn, Fast.ai, AllenNLP, OpenCV, HuggingFace
  • Deep learning: TensorFlow, PyTorch, MXNet, JAX, Chainer
  • Serving: TensorFlow Serving, TensorRT, TorchServe, MXNet Model Server
  • C++
  • Scala
  • Bash
  • Git, Github/Bitbucket

ML use case identification

The first step in identifying an ML use case requires business or domain experts. Many successful AI projects start with a deep understanding of the potential business problems that AI can solve and require a combination of intuition and knowledge from experienced technical and business experts. At this stage, typical team members include:

  • Business leaders (tech decision-makers).
  • Product managers.
  • AI team leaders.
  • Possibly one or more senior data scientists with deep hands-on experience with data collection and processing.

Data

The second phase focuses on collecting data, cleaning it, processing it and bringing it from a raw to a structured format, and storing it in specific local databases or cloud repositories. At this stage, the role of a data engineer is crucial, along with data scientists. Business and product managers play a valuable role in providing data, metadata, and any preliminary business ideas based on elementary analytics.

Modeling

The third stage includes basic data science and machine learning modeling using the datasets prepared in the previous stage. At this stage, data scientists, applied scientists, or research scientists train the initial models, refine them based on the test set’s performance and feedback from stakeholders, develop new algorithms, and finally build one or several candidate models that meet the required requirements such as accuracy and latency benchmarks to run models into production.

Deployment

The final phase of the machine learning life cycle focuses on deploying trained models to production, where they serve as predictions based on input from end-users. In this phase, ML engineers take the models developed by data scientists/appliers/researchers and prepare them for production. If the models meet the predefined accuracy and latency benchmarks, the models are ready to run.

Otherwise, ML engineers work to optimize model size, performance, latency, and throughput. Models undergo systematic A/B testing before deciding which model versions are best for deployment.

Business acceptance

Most of the time, business clients come to your team asking for help in achieving their business goals and are happy to implement machine learning in their field. But few understand the implications of automating current business processes and using complex algorithms to make decisions that people currently make. 

Deploying predictive models to the actual production within your company will require frank discussions with business clients ahead of time to ensure they understand how to deploy the model so that it is accepted, accepted, and used effectively by their teams.

Are you looking to outsource your machine learning project to an experienced provider?

ML development checklist

  • Define the scope and requirements
  • Validate project feasibility Discuss ML model building trade-offs (accuracy versus speed)
  • Set up the project environment and codebase
  • Create data labelling documentation (ground truth definition)
  • Build data ingestion pipeline
  • Validate data quality
  • Label data
  • Go back to step 1 and make sure you have enough data for the task
  • Create a baseline for model performance
  • Use initial data pipeline to build a simple model
  • Overfit a simple model to training data
  • Identify a SoTA model for your problem area (if available), reproduce the results, and apply them to your dataset as a second baseline
  • Review step 1 and make sure it is feasible
  • Go back to step 2 and make sure the data quality is good enough
  • Perform optimizations specific for your model (e.g. hyperparameter tuning)
  • Debug the model iteratively as it becomes more complex
  • Identify common failure modes by performing error analysis
  • Return to step 2 for targeted data collection and observed failure modes labeling
  • Evaluate the model on the test distribution to understand the difference between train and test set distributions
  • Review the model evaluation metric
  • Create tests for input data pipeline, model inference functionality and performance on validation data
  • Explore explicit scenarios expected in the production environment (evaluate the model against a carefully selected set of observations)
  • Expose your ML model via the REST API
  • Deploy the new model to a small group of users to make sure everything goes smoothly, then deploy it to all users.
  • Maintain the ability to roll back to previous model versions
  • Monitor real-time data and model predictions
  • Retrain the model periodically to prevent and avoid model staleness.
  • Train a new team in case the model ownership has changed. 

ML project planning challenges

It’s recommendeded that ML teams adopt some form of agile methodology for ML projects. In addition, it’s recommended they follow a Cross-Industry Standard Process for Data Mining (CRISP-DM) – the most widely-used analytics model and an open standard process model that describes common approaches used by data mining experts.

The components of the CRISP-DM methodology can serve as the main anchors and functional features of your project.

 

Crisp-ML process

 

Let’s go through them briefly:

Business understanding

This phase focuses on understanding the overall project requirements, goals, and defining business metrics. This step also assesses the availability of resources, risks and contingencies, and conducts a cost-benefit analysis. In addition, core technologies and tools are chosen at this stage.

Data understanding 

Data is a core part of any machine learning project. Without data to learn from, models cannot exist. Unfortunately, accessing and using data can take a very long time in many companies due to rules and procedures.

In this phase, the focus is on identifying, exploring, collecting, and analyzing data to achieve the project goal. This step includes identifying data sources, accessing data, creating data storage environments, and preliminary data analysis.

Data preparation

Even after the necessary data has been obtained, it is likely that it will need to be cleaned or transformed as it moves through the enterprise. In this step, the dataset is prepared for modeling; it includes subtasks of data selection, cleansing, formatting, integration, and data construction and builds data pipelines for ETL (extract, transform, load). The data will be changed several times. Understanding the processes involved in preparing these proposed subtasks is necessary for effective model building.

Modeling

After preparing the data, it is time to build and evaluate various models based on several different modeling techniques. This step consists of choosing modeling methods, developing features, creating a test project, building and evaluating models. The CRISP-DM manual suggests “repeat building and evaluating the model until you believe you have found the best one(s).”

Evaluation

The evaluation and analysis discussed above focus on evaluating the technical model. The Evaluation phase is broader as it assesses which model best fits the business objectives and the baseline. The subtasks in this phase evaluate the results (improving the model, performance metrics), analyze the processes, and determine the next steps.

Deployment

Your deployment strategy defines the complexity of this stage that includes deployment planning, monitoring and maintenance, final reporting, and validation.

Suggested sub-tasks include 

  • Building an application, 
  • Deploying for quality assurance (QA), 
  • Automating the data pipeline, 
  • Performing Integration Testing, and 
  • Deploying to production.

Going Agile

Effective communication is key to any software project success. Project status is a regular formalized report on the project’s progress in accordance with the project plan. As a project manager, status reporting will be part of your job. Adequate planning and structure are essential for tools like JIRA or TFX.

First, define the roles of each of your project stakeholder. Specify who will lead your ML team and who will play the role of your product owner, scrum master, and release engineer.

Then think about the big picture. Start small and scale. Let your ML engineers and data scientists create several small models and then use them together to solve various business problems. 

Talk to the technical lead and learn how to develop and use the model. If the model is part of ongoing work, you can add this project to a recent agile release series (ART) if it makes sense and is agreed with the team. Another thing to consider is creating a dashboard at the portfolio level. This is very helpful when communicating with executives as it gives you a quick view of the status of the entire team.

Define your initiatives or features – the major business milestones you want to achieve. You can start the list of features with the subtasks we identified above in your project plan. You should test the features with the product owner and tech lead, prioritize with the same people, and keep the structure consistent.

Setting up a consistent structure in Jira makes it easy to create automated metrics and reports. Also, educate the team on the methodology, processes, and how to use the necessary tools.

Prepare to plan your first product increment (PI) or release. Once you have a plan and some features defined, set up a backlog session with your team to create stories for each feature. When your backlog is overwhelmed, set up your first PI planning.

Our suggestion is to have a clear agenda that you communicate ahead of time and check with the Product Owner and Technical Lead. Planning aims to fill the next sprints with stories and identify sufficient features. This marks the start of your project and the start of actual development.

ML project execution challenges

Agile Team maturity 

Your software development team or Squad may not have experience with agile methodology. In this case, agile concepts should gradually be instilled within your team, and adequate training should be provided. In many cases, you can organize training sessions that are interesting and relevant to the team. Once work begins, the team will become familiar with the process, and it will take two to three sprints before you can start implementing tighter agile controls and generating metrics that make sense.

Data exploration

Without data, there is no machine learning. To get data in one place, many companies today require opening access tickets, following strict rules or policies, and even taking data security course before being granted permission to work with the data. This is why it is difficult to predict the length of the data exploration phase of your ML project.

Take time and be as clear as possible on each data source needed so that you can prepare as best as possible for this stage. You can spend one or two sprints just on getting access to the right data and another two or three sprints to explore it.

Consider switching to Kanban at this project stage if your team can handle it. When you create stories for this project phase, try to divide user stories into vertical slices by data source, environment, and even by team members.

Modeling iterations

This is another challenge for the project manager because there will be unknowns and uncertainties along the way. The technical team cannot estimate how long the model training will take until they need to try out different algorithms and change features a few times. Allow iterations in your project plan. As you set up and plan your sprints, create user stories that are small enough to be completed in one sprint. 

Models interpretability

ML algorithms can be a black box even for seasoned software developers. The model cannot be checked in the usual way because there are no rules we can follow. Instead, the machine determines the result based on complex calculations that cannot be reproduced. There are technical ways, such as using certain code libraries, that provide sufficient explainability of the model.

Work status visibility

One of the biggest complaints business owners have about a machine learning project is the lack of visibility of the work. This is due to the very technical nature of this work. One way to provide visibility is to use flexible metrics to show how many features are in progress, how much is delayed, and whether your team has achieved what they’ve committed to for each iteration or PI period.

Business results

Make sure you have clear milestones with your stakeholders throughout the agile process. Add them to all important meetings and, more importantly, make sure your development team has business results assigned at the end of each major project phase. This is a great way for a project manager to keep a close eye on how the project is progressing, especially if the modeling phase results change your project scope.

ML project closure​ challenges

This requires a clear definition of the model ownership. Who owns the model now? Is it still your team or a customer? It is often unreasonable to expect your customer to know how to own and maintain an ML model. Therefore, they should be provided with a relevant easy-to-use application that should act as a bridge between the model and the user.

So, when the project closes, the maintenance phase begins. At this stage, you, as the project manager, must provide the business client with business and analytical metrics to ensure that the model is working as expected. It is recommended to create a change management process as the model can show signs of deterioration over time.

After deployment, someone from your team should be sitting at the change control board to regularly review the metrics and performance of the model and decide when to retrain the model or redeploy it.

Final thoughts

Regardless of the scale of your machine learning project, its implementation is a time-consuming process consisting of the same basic steps with a straightforward set of tasks. Distribution of roles in teams of data scientists and analysts is not mandatory and may depend on the project’s scope, budget, timing, and specific challenges listed above. For example, professionals working in small teams usually share the responsibilities of several team members.

Although the project’s main goal – the development and deployment of a predictive model – has been achieved, the project continues. Data scientists must ensure that the accuracy of the prediction results meets performance requirements and improve the model as needed. Make sure you monitor the performance of the deployed model if you are not running a dynamic model in production.

For fast and effective ML model development and deployment, it’s highly recommended that you engage an expert ML solutions development consultancy with its own R&D Center, own pool of/access to highly qualified, seasoned data science and ML engineering talents, and sufficient bandwidth of tools and resources.

We at rinf.tech specialize in custom machine learning solutions development. Don't hesitate to contact us if you need help with your ML project.

 

Looking for a technology partner?

Let’s talk.

Related Articles