ck
ck copied to clipboard
[cm4mlops] development plan
The feedback from the MLCommons TF on automation and reproducibility to extend CM workflows to support the following MLC projects:
-
[x] check how to add network and multi-node code to MLPerf inference and CM automation (collaboration with MLC Network TF)
- [x] extend MLPerf inference with Flask code, gluing with our ref client/server code (Python and later C++) and CM wrapping
- [x] address suggestions from Nvidia
- [x] --network-server=IP1,IP2...
- [x] --network-client
-
[ ] continue improving unified CM interface to run MLPerf inference implementations from different vendors
- [ ] Optimized MLPerf inference implementations
- [ ] Intel submissions (see Intel docs)
- [x] Support installation of conda packages in CM
- [x] Qualcomm submission
- [x] Add CM scripts to preprocess, calibrate and compile QAIC models for ResNet50, RetinaNet and Bert
- [x] Test in AWS
- [x] Test on Thundercomm RB6
- [x] Automatic model installation from a host device
- [x] Automatic detection and usage of quantization parameters
- [x] Nvidia submission
- [ ] Google submission
- [x] NeuralMagic submission
- [ ] Intel submissions (see Intel docs)
- [ ] Add possibility to run any MLPerf implementation including ref
- [ ] Add possibility to change target device (eg GeForce instead of A100)
- [ ] Expose batch sizes from all existing MLPerf inference reference implementations (when applicable) in edge category in a unified way for ONNX, PyTorch and TF via the CM interface. Report implementations with hardwired batch size.
- [ ] Request from Miro: improve MLPerf inference docs for various backends
- [ ] Optimized MLPerf inference implementations
-
[ ] Develop universal CM-MLPerf docker to run any implementation with local data set and model (similar to Nvidia and Intel but with a unified CM interface)
-
[ ] Prototype new universal CM workflow to run any app on any target (with C++/Android/SSH)
-
[ ] Add support for any ONNX+loadgen model testing with tuning (prototyped already)
-
[ ] Improve CM docs (basic CM message and tutorials/notes for "users" and "developers")
-
[ ] Update/improve a list of all reusable, portable and tech-agnostic CM-MLOps scripts
-
[ ] Start adding FAQ/notes from Discord/GitHub discussions about CM-MLPerf
-
[ ] prototype/reuse above universal CM workflow with ABTF for
- [ ] inference
- [ ] support different targets (host, remove embedded, Android)
- [ ] get all info about target
- [x] add Python and C++ code for loadgen with different backends (PyTorch, ONNX, TF, TFLite, QAIC)
- [x] add object detection with COCO and trained model from Rod (without accuracy for now)
- [ ] connect with training CM workflow
- [ ] training (https://github.com/mlcommons/abtf-ssd-pytorch)
- [x] present CM-MLPerf at Croissant TF and discuss possible collaboration (doc)
- [x] add CM script to get Croissant
- [ ] add datasets via Croissant
- [x] train and save model in CM cache to be loaded to inference
- [x] test with Rod
- [x] present prototype progress in next ABTF meeting (Grigori)
- [ ] inference
-
[ ] unify experiment and visualization
- [ ] prepare high-level meta to run the whole experiment
- [ ]aggregate and visualize results
- [ ] if MLPerf run is very short, we need to kind of calibrate it by multiplting N*10 for example similar to what I did in CK