This PR

a --serialized-file that's in .onnx format, which will be correctly loaded by the base handler using an ort.InferenceSession() https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
TensorRT format is a .ts extension which can be loaded via torch.jit.load() https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html#getting-started-with-python-api
A hardcoded model_config.json similar to index_to_name.json which would have model specific information which you can get access via `model_config.get("property"). For now I'm not using any special configs
Starting with torch 1.12 nvfuser is now the default backend for torchscript so that's a free performance enhancement we got where we don't need to do anything

TODO

Tests
Change docs in many places

Open question

Should we do the conversion from torch to ONNX in the initialize()? My gut is no, we don't ask users to train models in initialize() they should prepare their models and once prepared used serve but I could be convinced otherwise
Should I put a schema on model_config.json? I'd like to wait and see what schema gets chosen for #1664 and then decide
There are other ways of designing the same thing but I felt like this was the simplest, I thought about the here https://gist.github.com/msaroufim/c5aa505fdd7053ca184da921536c26f1

Future work

I can add dynamo support now as well so we can support in one go several inference backends https://github.com/pytorch/torchdynamo#existing-backends but folks will need to install from source which may not be ideal except for the more advanced users
nvfuser is actually already supported and starting 1.12 has become the default fusing backend for torch.jit.script()

Sep 12 '22 12:09 msaroufim

Codecov Report

Merging #1857 (0733614) into master (a5fc61f) will increase coverage by 2.99%. The diff coverage is 25.67%.

:exclamation: Current head 0733614 differs from pull request most recent head 6c9c7c5. Consider uploading reports for the commit 6c9c7c5 to get more accurate results

@@            Coverage Diff             @@
##           master    #1857      +/-   ##
==========================================
+ Coverage   41.67%   44.66%   +2.99%     
==========================================
  Files          55       63       +8     
  Lines        2282     2624     +342     
  Branches        1       56      +55     
==========================================
+ Hits          951     1172     +221     
- Misses       1331     1452     +121

Impacted Files	Coverage Δ
setup.py	`0.00% <0.00%> (ø)`
ts/torch_handler/base_handler.py	`0.00% <0.00%> (ø)`
ts/utils/util.py	`37.50% <ø> (ø)`
...l-archiver/model_archiver/model_packaging_utils.py	`54.66% <54.28%> (ø)`
...el-archiver/model_archiver/model_archiver_error.py	`100.00% <0.00%> (ø)`
workflow-archiver/workflow_archiver/version.py	`100.00% <0.00%> (ø)`
model-archiver/model_archiver/model_packaging.py	`90.00% <0.00%> (ø)`
...w-archiver/workflow_archiver/workflow_packaging.py	`89.65% <0.00%> (ø)`
model-archiver/model_archiver/version.py	`100.00% <0.00%> (ø)`
...hiver/workflow_archiver/workflow_archiver_error.py	`100.00% <0.00%> (ø)`
... and 2 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sep 12 '22 12:09 codecov[bot]

Beside the inline comments, we also would need to think of preprocessing for onnx (to avoid custom handlers), as the input need to be converted to numpy. We might be able to have separate util files for onnx, trt or future backends/dynamo or alternatively have one backend utils to keep all the related helper functions there. This may help to keep the base_handler cleaner and more maintainable. cc @lxning on this as well.

Sep 20 '22 19:09 HamidShojanazeri

Just a quick update, the main challenge for this PR will be correctly packaging setup.py with all the correct package versions and I'm seeing all sorts of basic issues even with basic models to setup some E2E test. Some relevant issues

https://github.com/pytorch/TensorRT/issues/1233#issuecomment-1274037109
https://github.com/pytorch/ort/issues/150
Other alternatives like AITemplate still don't have an official release and only support Ampere (A100+) https://github.com/facebookincubator/AITemplate/issues/36

Is linking to specific versions enough? Should I use git submodules? When do I update them? How will docker support work?

EDIT:

The installation experience is now pip install torchserve[tensorrt] if remote or pip install .[tensorrt] which will install the necessary packages and the full set of options is onnx, tensorrt, ipex

Oct 11 '22 03:10 msaroufim

And here's an example of an inference running from my logs. Repro is in test/pytest/test_onnx.py

(base) ubuntu@ip-172-31-17-70:~$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
[
  -0.28003010153770447
](base) ubuntu@ip-172-31-17-70:~$

Nov 05 '22 00:11 msaroufim

Discussed offline with Li and Hamid

To merge this PR

[x] Disable test by default in CI
[x] Remove requirements from developer dependencies
[x] Add documentation to describe how people should pre and post process with ONNX
[x] Update test

Future

automation of pre and post processing
Refactor docker image to pull in necessary runtime dependencies optionally
Refactor base handler

Nov 08 '22 18:11 msaroufim

@HamidShojanazeri @lxning I made all the changes we discussed

Logs

ubuntu@ip-172-31-17-70:~/serve/test/pytest$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
Prediction Succeeded

Pytest

(ort) ubuntu@ip-172-31-17-70:~/serve/test/pytest$ pytest test_onnx.py 
==================================== test session starts =====================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.0.0
collected 5 items                                                                            

test_onnx.py .....                                                                     [100%]

===================================== 5 passed in 1.17s ======================================

Link check is failing because I'm linking to a test file that doesn't exist on master yet https://github.com/pytorch/serve/actions/runs/3425428486/jobs/5706209933#step:5:867 - this wont be a problem after we merge

Nov 09 '22 04:11 msaroufim

serve
serve copied to clipboard

Add ONNX and TensorRT support

This PR

TODO

Open question

Future work

Codecov Report

Logs

Pytest

serve serve copied to clipboard

Add ONNX and TensorRT support

This PR

TODO

Open question

Future work

Codecov Report

Logs

Pytest

serve
serve copied to clipboard