An Interview Study by UC Berkeley Researchers Explain the Process of Operationalizing Machine Learning or MLOps that Expose Variables that Govern the Success of Machine Learning Models in Deployment

As Machine Learning becomes increasingly prevalent in software, a new subfield known as MLOps (short for ML Operations) has evolved to organize the “collection of methods that attempt to deploy and manage ML models in production safely and effectively.” MLOps is commonly acknowledged to be complicated. According to anecdotal statistics, 90% of ML models do not make it to production; some suggest that 85% of ML initiatives fail to produce value. At the same time, it’s unclear why MLOps is difficult. Researchers’ current understanding of MLOps is confined to a patchwork of white papers, anecdotes, and opinion pieces, as a cottage economy of businesses attempting to address MLOps concerns.

MLOps issues “technical debt,” resulting in “huge continuing maintenance expenses in real-world ML systems.” Most successful ML deployments appear to entail a “team of engineers that spend a large amount of their time on the less glamorous elements of ML, such as maintaining and monitoring ML pipelines.” Prior research looked at generic data analysis and scientific procedures without considering the MLOps issues of model creation. There is an urgent need to clarify MLOps, especially in describing what MLOps typically entails across enterprises and ML applications. A more comprehensive understanding of best practices and obstacles in MLOps can reveal gaps in current procedures and better influence the creation of next-generation solutions.

Source: https://arxiv.org/pdf/2209.09125.pdf | Standard ML pipeline

As a result, they performed a semi-structured interview study of MLengineers (MLEs), each of whom has worked on production ML models. They recruited 18 participants from various businesses and apps, as shown in the table below, and asked open-ended questions to understand their workflow and day-to-day issues better. MLEs, as indicated in the Figure above, undertake their primary tasks: data gathering, (ii) experimentation, (iii) assessment and deployment, and (iv) monitoring and reaction. They see three elements that influence the effectiveness of a production ML deployment across tasks: velocity, validation, and versioning. They discuss specific MLOps techniques, which are organized into broad findings: The nature of MLengineering is highly exploratory.

Source: https://arxiv.org/pdf/2209.09125.pdf | Description of interviewees

As previously said, numerous theories indicate that 90% of models never make it to production. However, they believe that this estimate is incorrect. Because of ongoing testing, multiple variants were created, only a tiny percentage of which will make it to production. As a result, it is advantageous to prototype ideas rapidly, with little modifications to established procedures, and to demonstrate practical advantages early so faulty models never get far. Model assessment takes an intentional organizational effort to operationalize. Popular model assessment “best practices” do not reflect the seriousness with which enterprises consider deployments: they often focus on utilizing one typically-static held-out dataset to evaluate the model a single ML metric choice (e.g., accuracy, recall).

They discovered that MLEs devote substantial efforts to maintaining various up-to-date assessment datasets and metrics across time to ensure that data subpopulations of interest are fully covered. Non-ML rules and human-in-the-loop techniques provide production models’ reliability. They discovered that MLEs favor basic concepts, even if it involves dealing with many versions. For example, rather than using complex strategies to minimize distribution shift errors, MLEs would develop new models and retrain them on new data. MLEs assured the dependability of deployments by employing tactics such as on-call rotations, model rollbacks, and extensive rule-based guardrails to prevent wrong outputs. The paper is a must-read for anyone trying to do ML in production.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Operationalizing Machine Learning: An Interview Study'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit

Content Writing Consultant Intern at Marktechpost.