aqa-test-tools
aqa-test-tools copied to clipboard
Proposal: create live Deep Learning service for analyzing test output
Thanks to @LongyuZhang , we have the initial Deep Learning (DL) prototype that takes test outputs (from TRSS) as the training data to predict possible issues. The prototype uses Tensorflow for test output classification. It is improved with TF-IDF method and weighted model. We have achieved a lot so far. However, there are lots of work that need to be done. For example, we need to further refine the model, collect more types of test outputs data, utilize more detailed information for DL model training and testing. Our goal is to refine the DL model and use it to suggest possible issues/solutions related to the test failure.
Currently, the work has mostly done locally. It is very time consuming, limited data set, and unreliable. It will be great if we can create a live DL service using a machine that can run machine learning so that we can
- have API to get the result from DL model at runtime
- constantly using new TRSS data for model training and refinement
- get feedback and adjustment quickly to shorten the development cycle
This can be separated into two parts:
- create the API that uses the trained model to predict possible issues
- automate data gathering and DL training process to generate trained model
For part 2, we would like to get a server with GPU that can run machine learning https://www.tensorflow.org/install/gpu
We should also investigate the existing machine learning pipelines. For example https://cloud.google.com/blog/products/ai-machine-learning/cloud-ai-helps-you-train-and-serve-tensorflow-tfx-pipelines-seamlessly-and-at-scale
Thanks @LongyuZhang and @llxia !
For other people's reference, Longyu does a good summary of this prototype in this presentation: https://www.crowdcast.io/e/learning-about-deep
This will be a wonderful starting point under which we will engage other community members and some upcoming student programs!
For fun, and to help track all of the cool initiatives we have at the project, I've codenamed this work... "deep AQAtik" (where tik stands for 'triage in kind').
Plans for ML + TRSS project:
- Develop initial prototype for Possible Issue ML project to obtain the ML model;
- Deploy the ML model with Python Flask server, the server needs to process data with the same procedur and parameters from the training process
- In TRSS possible issue, get possible issue with the testname keyword first, and then rank them with ML scores from ML deployment server
- Update repo links due to the migration to Eclipse
- setup ML training and deployment server on machines different from TRSS (e.g. new Fyre machine and external machine ), with url to receive and send web request.
- collect contents of github related issues automatically, and then monitor these repos to collect new issues and re-train the model.
- Optimize the training algorithm to improve performance of the ML model.
- Optimize the ML pipeline through referring to TensorFlow Extended, the book “The ultimate guide to MLOps”, and other ML pipeline related resources.
Other ideas:
- Automatically open or suggest new GitHub issues, when a test fails and ml model predicts no available possible issues.
- (limitation of GitHub Search api frequency) Change icon color of “possible issues” so user can know if there are related issues without clicking it.
- add button for users to give feedback, and use the feedback to further improve ML model performance.
- use ML to predict machine failure, and then take the machine offline if needed.
- Use GitHub issue labels to predict failed test types, such as vm or jit.
- Add JBS issues in possible issues.