ck [cm4mlops] development plan

[cm4mlops] development plan

Open gfursin opened this issue 1 year ago • 0 comments

The feedback from the MLCommons TF on automation and reproducibility to extend CM workflows to support the following MLC projects:

[x] check how to add network and multi-node code to MLPerf inference and CM automation (collaboration with MLC Network TF)
- [x] extend MLPerf inference with Flask code, gluing with our ref client/server code (Python and later C++) and CM wrapping
- [x] address suggestions from Nvidia
  - [x] --network-server=IP1,IP2...
  - [x] --network-client
[ ] continue improving unified CM interface to run MLPerf inference implementations from different vendors
- [ ] Optimized MLPerf inference implementations
  - [ ] Intel submissions (see Intel docs)
    - [x] Support installation of conda packages in CM
  - [x] Qualcomm submission
    - [x] Add CM scripts to preprocess, calibrate and compile QAIC models for ResNet50, RetinaNet and Bert
    - [x] Test in AWS
    - [x] Test on Thundercomm RB6
      - [x] Automatic model installation from a host device
    - [x] Automatic detection and usage of quantization parameters
  - [x] Nvidia submission
  - [ ] Google submission
  - [x] NeuralMagic submission
- [ ] Add possibility to run any MLPerf implementation including ref
- [ ] Add possibility to change target device (eg GeForce instead of A100)
- [ ] Expose batch sizes from all existing MLPerf inference reference implementations (when applicable) in edge category in a unified way for ONNX, PyTorch and TF via the CM interface. Report implementations with hardwired batch size.
- [ ] Request from Miro: improve MLPerf inference docs for various backends
[ ] Develop universal CM-MLPerf docker to run any implementation with local data set and model (similar to Nvidia and Intel but with a unified CM interface)
[ ] Prototype new universal CM workflow to run any app on any target (with C++/Android/SSH)
[ ] Add support for any ONNX+loadgen model testing with tuning (prototyped already)
[ ] Improve CM docs (basic CM message and tutorials/notes for "users" and "developers")
[ ] Update/improve a list of all reusable, portable and tech-agnostic CM-MLOps scripts
[x] Improve CM logging (stdout and stderr)
[ ] Visualize CM script dependencies
[ ] Check other suggestions from student teams from SCC'23
[ ] Start adding FAQ/notes from Discord/GitHub discussions about CM-MLPerf
[ ] prototype/reuse above universal CM workflow with ABTF for
- [ ] inference
  - [ ] support different targets (host, remove embedded, Android)
  - [ ] get all info about target
  - [x] add Python and C++ code for loadgen with different backends (PyTorch, ONNX, TF, TFLite, QAIC)
  - [x] add object detection with COCO and trained model from Rod (without accuracy for now)
  - [ ] connect with training CM workflow
- [ ] training (https://github.com/mlcommons/abtf-ssd-pytorch)
  - [x] present CM-MLPerf at Croissant TF and discuss possible collaboration (doc)
  - [x] add CM script to get Croissant
  - [ ] add datasets via Croissant
  - [x] train and save model in CM cache to be loaded to inference
  - [x] test with Rod
- [x] present prototype progress in next ABTF meeting (Grigori)
[ ] unify experiment and visualization
- [ ] prepare high-level meta to run the whole experiment
- [ ]aggregate and visualize results
- [ ] if MLPerf run is very short, we need to kind of calibrate it by multiplting N*10 for example similar to what I did in CK

Nov 23 '23 14:11 gfursin

ck ck copied to clipboard

[cm4mlops] development plan

ck
ck copied to clipboard