The Continuous Integration system at Mozilla includes 85,000 files, each containing many test functions. These tests need to run on all platforms – Ios, Windows, and Linux. However, it is impossible to test each function on all platforms. Instead, Mozilla had developed a strategy to test 90 unique functions, since running every configuration would amount to 2.3 billion test files per day. The functions were chosen through the principles of importance and relevancy, and the configurations were tested through the integration branch. However, the heuristics of ranking tests based on the frequency of failure was naive as it did not consider the contents of the patch.
Additionally, choosing tests can be a time-consuming task and may lead to over selection. The Mozilla developers chose to run machine learning algorithms as they hypothesised that it could result in the quicker, efficient, and economical selection of optimal tests to run. Hence, the developers constructed infrastructure to ensure the smooth execution of the CI pipeline.
Machine Learning Developers Summit 2022. Last Day To Book Early Bird Passes>>
To build the training model, the developers initially had to solve the problem of naive heuristics. They built a set of complex heuristics to predict which patch causes which regression. Some failures are classified/annotated by humans as ‘intermittent’ or ‘fixed by commit’ and help find the regressions of the missing or intermittent tests. Since 100% accuracy is not attainable, the developers have built heuristics to evaluate the classifications. Apart from solving the problem of the heuristic, the developers also have to collect data on the patches themselves. They correlate with the test failure data, which provides deterministic results for ML models as to which tests are more likely to fail for a given patch.
With the complex heuristics, dataset of patches and associated tests, the developers at Mozilla have built a training set and a validation set to teach the ML model how to select optimal tests. 90% of the data set is the training set, and the other 10% is the validation set. The validation is carefully chosen to be posterior to the training set to avoid information leakage. The preventative measure reduces the risk of a biased and artificially enhanced ML model.
Mozilla developers then train XGBoost models using tests, patches, and links features. The model is trained using the TUPLE test patch input, and the output is a single binary that determines whether or not the patch passes the test. A single model can run all tests.
To further optimise the running of tests, the developers choose to test …….