marie-ai Exporting and Inference on ONNX models

Exporting and Inference on ONNX models

Open gregbugaj opened this issue 2 years ago • 0 comments

To improve model performance during CPU inference we can convert the models for ONNX and then use if onnxruntime is available during inference time.

Following script check_onnx_runtime.py can be used to test the performance of the models.

Inference time Results

2400x2400 on Resnet50 model PyTorch 3.6160961884500464 VS ONNX 2.131322395749976

1200x1200 on Resnet50 model PyTorch 0.8162189463499999 VS ONNX 0.35815778665000836

512x512 on Resnet50 model
PyTorch 0.12735954449999554 VS ONNX 0.08733407934996648

This is a good implementation that we can base our work on.

https://github.com/ultralytics/yolov5/blob/master/export.py
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html
https://facilecode.com/speed-pytorch-vs-onnx/
https://cloudblogs.microsoft.com/opensource/2022/04/19/scaling-up-pytorch-inference-serving-billions-of-daily-nlp-inferences-with-onnx-runtime/
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/PyTorch_Bert-Squad_OnnxRuntime_GPU.ipynb

Mar 21 '22 16:03 gregbugaj