It’s recommendeded that ML teams adopt some form of agile methodology for ML projects. In addition, it’s recommended they follow a Cross-Industry Standard Process for Data Mining (CRISP-DM) – the most widely-used analytics model and an open standard process model that describes common approaches used by data mining experts.
The components of the CRISP-DM methodology can serve as the main anchors and functional features of your project.
Let’s go through them briefly:
This phase focuses on understanding the overall project requirements, goals, and defining business metrics. This step also assesses the availability of resources, risks and contingencies, and conducts a cost-benefit analysis. In addition, core technologies and tools are chosen at this stage.
Data is a core part of any machine learning project. Without data to learn from, models cannot exist. Unfortunately, accessing and using data can take a very long time in many companies due to rules and procedures.
In this phase, the focus is on identifying, exploring, collecting, and analyzing data to achieve the project goal. This step includes identifying data sources, accessing data, creating data storage environments, and preliminary data analysis.
Even after the necessary data has been obtained, it is likely that it will need to be cleaned or transformed as it moves through the enterprise. In this step, the dataset is prepared for modeling; it includes subtasks of data selection, cleansing, formatting, integration, and data construction and builds data pipelines for ETL (extract, transform, load). The data will be changed several times. Understanding the processes involved in preparing these proposed subtasks is necessary for effective model building.
After preparing the data, it is time to build and evaluate various models based on several different modeling techniques. This step consists of choosing modeling methods, developing features, creating a test project, building and evaluating models. The CRISP-DM manual suggests “repeat building and evaluating the model until you believe you have found the best one(s).”
The evaluation and analysis discussed above focus on evaluating the technical model. The Evaluation phase is broader as it assesses which model best fits the business objectives and the baseline. The subtasks in this phase evaluate the results (improving the model, performance metrics), analyze the processes, and determine the next steps.
Your deployment strategy defines the complexity of this stage that includes deployment planning, monitoring and maintenance, final reporting, and validation.
Suggested sub-tasks include
- Building an application,
- Deploying for quality assurance (QA),
- Automating the data pipeline,
- Performing Integration Testing, and
- Deploying to production.
Effective communication is key to any software project success. Project status is a regular formalized report on the project’s progress in accordance with the project plan. As a project manager, status reporting will be part of your job. Adequate planning and structure are essential for tools like JIRA or TFX.
First, define the roles of each of your project stakeholder. Specify who will lead your ML team and who will play the role of your product owner, scrum master, and release engineer.
Then think about the big picture. Start small and scale. Let your ML engineers and data scientists create several small models and then use them together to solve various business problems.
Talk to the technical lead and learn how to develop and use the model. If the model is part of ongoing work, you can add this project to a recent agile release series (ART) if it makes sense and is agreed with the team. Another thing to consider is creating a dashboard at the portfolio level. This is very helpful when communicating with executives as it gives you a quick view of the status of the entire team.
Define your initiatives or features – the major business milestones you want to achieve. You can start the list of features with the subtasks we identified above in your project plan. You should test the features with the product owner and tech lead, prioritize with the same people, and keep the structure consistent.
Setting up a consistent structure in Jira makes it easy to create automated metrics and reports. Also, educate the team on the methodology, processes, and how to use the necessary tools.
Prepare to plan your first product increment (PI) or release. Once you have a plan and some features defined, set up a backlog session with your team to create stories for each feature. When your backlog is overwhelmed, set up your first PI planning.
Our suggestion is to have a clear agenda that you communicate ahead of time and check with the Product Owner and Technical Lead. Planning aims to fill the next sprints with stories and identify sufficient features. This marks the start of your project and the start of actual development.