serve
serve copied to clipboard
Add ONNX and TensorRT support
This PR
- a
--serialized-filethat's in.onnxformat, which will be correctly loaded by the base handler using anort.InferenceSession()https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html - TensorRT format is a
.tsextension which can be loaded viatorch.jit.load()https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html#getting-started-with-python-api - A hardcoded
model_config.jsonsimilar toindex_to_name.jsonwhich would have model specific information which you can get access via `model_config.get("property"). For now I'm not using any special configs - Starting with torch 1.12 nvfuser is now the default backend for torchscript so that's a free performance enhancement we got where we don't need to do anything
TODO
- Tests
- Change docs in many places
Open question
- Should we do the conversion from torch to ONNX in the
initialize()? My gut is no, we don't ask users to train models ininitialize()they should prepare their models and once prepared usedservebut I could be convinced otherwise - Should I put a schema on
model_config.json? I'd like to wait and see what schema gets chosen for #1664 and then decide - There are other ways of designing the same thing but I felt like this was the simplest, I thought about the here https://gist.github.com/msaroufim/c5aa505fdd7053ca184da921536c26f1
Future work
- I can add dynamo support now as well so we can support in one go several inference backends https://github.com/pytorch/torchdynamo#existing-backends but folks will need to install from source which may not be ideal except for the more advanced users
- nvfuser is actually already supported and starting 1.12 has become the default fusing backend for
torch.jit.script()
Codecov Report
Merging #1857 (0733614) into master (a5fc61f) will increase coverage by
2.99%. The diff coverage is25.67%.
:exclamation: Current head 0733614 differs from pull request most recent head 6c9c7c5. Consider uploading reports for the commit 6c9c7c5 to get more accurate results
@@ Coverage Diff @@
## master #1857 +/- ##
==========================================
+ Coverage 41.67% 44.66% +2.99%
==========================================
Files 55 63 +8
Lines 2282 2624 +342
Branches 1 56 +55
==========================================
+ Hits 951 1172 +221
- Misses 1331 1452 +121
| Impacted Files | Coverage Δ | |
|---|---|---|
| setup.py | 0.00% <0.00%> (ø) |
|
| ts/torch_handler/base_handler.py | 0.00% <0.00%> (ø) |
|
| ts/utils/util.py | 37.50% <ø> (ø) |
|
| ...l-archiver/model_archiver/model_packaging_utils.py | 54.66% <54.28%> (ø) |
|
| ...el-archiver/model_archiver/model_archiver_error.py | 100.00% <0.00%> (ø) |
|
| workflow-archiver/workflow_archiver/version.py | 100.00% <0.00%> (ø) |
|
| model-archiver/model_archiver/model_packaging.py | 90.00% <0.00%> (ø) |
|
| ...w-archiver/workflow_archiver/workflow_packaging.py | 89.65% <0.00%> (ø) |
|
| model-archiver/model_archiver/version.py | 100.00% <0.00%> (ø) |
|
| ...hiver/workflow_archiver/workflow_archiver_error.py | 100.00% <0.00%> (ø) |
|
| ... and 2 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Beside the inline comments, we also would need to think of preprocessing for onnx (to avoid custom handlers), as the input need to be converted to numpy. We might be able to have separate util files for onnx, trt or future backends/dynamo or alternatively have one backend utils to keep all the related helper functions there. This may help to keep the base_handler cleaner and more maintainable. cc @lxning on this as well.
Just a quick update, the main challenge for this PR will be correctly packaging setup.py with all the correct package versions and I'm seeing all sorts of basic issues even with basic models to setup some E2E test. Some relevant issues
- https://github.com/pytorch/TensorRT/issues/1233#issuecomment-1274037109
- https://github.com/pytorch/ort/issues/150
- Other alternatives like AITemplate still don't have an official release and only support Ampere (A100+) https://github.com/facebookincubator/AITemplate/issues/36
Is linking to specific versions enough? Should I use git submodules? When do I update them? How will docker support work?
EDIT:
The installation experience is now pip install torchserve[tensorrt] if remote or pip install .[tensorrt] which will install the necessary packages and the full set of options is onnx, tensorrt, ipex
And here's an example of an inference running from my logs. Repro is in test/pytest/test_onnx.py
(base) ubuntu@ip-172-31-17-70:~$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
[
-0.28003010153770447
](base) ubuntu@ip-172-31-17-70:~$
Discussed offline with Li and Hamid
To merge this PR
- [x] Disable test by default in CI
- [x] Remove requirements from developer dependencies
- [x] Add documentation to describe how people should pre and post process with ONNX
- [x] Update test
Future
- automation of pre and post processing
- Refactor docker image to pull in necessary runtime dependencies optionally
- Refactor base handler
@HamidShojanazeri @lxning I made all the changes we discussed
Logs
ubuntu@ip-172-31-17-70:~/serve/test/pytest$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
Prediction Succeeded
Pytest
(ort) ubuntu@ip-172-31-17-70:~/serve/test/pytest$ pytest test_onnx.py
==================================== test session starts =====================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.0.0
collected 5 items
test_onnx.py ..... [100%]
===================================== 5 passed in 1.17s ======================================
Link check is failing because I'm linking to a test file that doesn't exist on master yet https://github.com/pytorch/serve/actions/runs/3425428486/jobs/5706209933#step:5:867 - this wont be a problem after we merge