marie-ai
marie-ai copied to clipboard
Exporting and Inference on ONNX models
To improve model performance during CPU inference we can convert the models for ONNX and then use if onnxruntime is available during inference time.
Following script check_onnx_runtime.py
can be used to test the performance of the models.
Inference time Results
2400x2400 on Resnet50 model
PyTorch 3.6160961884500464 VS ONNX 2.131322395749976
1200x1200 on Resnet50 model
PyTorch 0.8162189463499999 VS ONNX 0.35815778665000836
512x512 on Resnet50 model
PyTorch 0.12735954449999554 VS ONNX 0.08733407934996648
This is a good implementation that we can base our work on.
-
https://github.com/ultralytics/yolov5/blob/master/export.py
-
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html
-
https://facilecode.com/speed-pytorch-vs-onnx/
-
https://cloudblogs.microsoft.com/opensource/2022/04/19/scaling-up-pytorch-inference-serving-billions-of-daily-nlp-inferences-with-onnx-runtime/
-
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/PyTorch_Bert-Squad_OnnxRuntime_GPU.ipynb