ck icon indicating copy to clipboard operation
ck copied to clipboard

[cm4mlops] development plan

Open gfursin opened this issue 1 year ago • 0 comments

The feedback from the MLCommons TF on automation and reproducibility to extend CM workflows to support the following MLC projects:

  • [x] check how to add network and multi-node code to MLPerf inference and CM automation (collaboration with MLC Network TF)

    • [x] extend MLPerf inference with Flask code, gluing with our ref client/server code (Python and later C++) and CM wrapping
    • [x] address suggestions from Nvidia
      • [x] --network-server=IP1,IP2...
      • [x] --network-client
  • [ ] continue improving unified CM interface to run MLPerf inference implementations from different vendors

    • [ ] Optimized MLPerf inference implementations
      • [ ] Intel submissions (see Intel docs)
        • [x] Support installation of conda packages in CM
      • [x] Qualcomm submission
        • [x] Add CM scripts to preprocess, calibrate and compile QAIC models for ResNet50, RetinaNet and Bert
        • [x] Test in AWS
        • [x] Test on Thundercomm RB6
          • [x] Automatic model installation from a host device
        • [x] Automatic detection and usage of quantization parameters
      • [x] Nvidia submission
      • [ ] Google submission
      • [x] NeuralMagic submission
    • [ ] Add possibility to run any MLPerf implementation including ref
    • [ ] Add possibility to change target device (eg GeForce instead of A100)
    • [ ] Expose batch sizes from all existing MLPerf inference reference implementations (when applicable) in edge category in a unified way for ONNX, PyTorch and TF via the CM interface. Report implementations with hardwired batch size.
    • [ ] Request from Miro: improve MLPerf inference docs for various backends
  • [ ] Develop universal CM-MLPerf docker to run any implementation with local data set and model (similar to Nvidia and Intel but with a unified CM interface)

  • [ ] Prototype new universal CM workflow to run any app on any target (with C++/Android/SSH)

  • [ ] Add support for any ONNX+loadgen model testing with tuning (prototyped already)

  • [ ] Improve CM docs (basic CM message and tutorials/notes for "users" and "developers")

  • [ ] Update/improve a list of all reusable, portable and tech-agnostic CM-MLOps scripts

  • [x] Improve CM logging (stdout and stderr)

  • [ ] Visualize CM script dependencies

  • [ ] Check other suggestions from student teams from SCC'23

  • [ ] Start adding FAQ/notes from Discord/GitHub discussions about CM-MLPerf

  • [ ] prototype/reuse above universal CM workflow with ABTF for

    • [ ] inference
      • [ ] support different targets (host, remove embedded, Android)
      • [ ] get all info about target
      • [x] add Python and C++ code for loadgen with different backends (PyTorch, ONNX, TF, TFLite, QAIC)
      • [x] add object detection with COCO and trained model from Rod (without accuracy for now)
      • [ ] connect with training CM workflow
    • [ ] training (https://github.com/mlcommons/abtf-ssd-pytorch)
      • [x] present CM-MLPerf at Croissant TF and discuss possible collaboration (doc)
      • [x] add CM script to get Croissant
      • [ ] add datasets via Croissant
      • [x] train and save model in CM cache to be loaded to inference
      • [x] test with Rod
    • [x] present prototype progress in next ABTF meeting (Grigori)
  • [ ] unify experiment and visualization

    • [ ] prepare high-level meta to run the whole experiment
    • [ ]aggregate and visualize results
    • [ ] if MLPerf run is very short, we need to kind of calibrate it by multiplting N*10 for example similar to what I did in CK

gfursin avatar Nov 23 '23 14:11 gfursin