optimum
optimum copied to clipboard
tritonserver loading onnx model exported by optimum failed
System Info
optimum: 1.14.1
python: 3.11
onnx: 1.15.0
onnxruntime: 1.16.3
tritionserver images: tritonserver:22.12-py3
Who can help?
@michaelbenayoun
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
- export onnx-model
optimum-cli export onnx --task text-generation --opset 17 --optimize O4 --fp16 --device cuda --model /home/storage00/D13_llama_7b_shortprompt D13_llama_7b_shortprompt_o1_fp16_opset17_onnx
2.run docker of tritonserver to inference
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/storage00/triton/models:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models --strict-model-config=false
3.Got an ErrorοΌ
4.Print opset of exported onnx model
opset of com.ms.internal.nhwc is 19!
Expected behavior
docker of tritonserver runs successfully with exported onnx model from optimum
hello?