tritonserver loading onnx model exported by optimum failed

Open yangelaboy opened this issue 2 years ago • 1 comments

System Info

optimum: 1.14.1
python: 3.11
onnx: 1.15.0
onnxruntime: 1.16.3
tritionserver images: tritonserver:22.12-py3

Who can help?

@michaelbenayoun

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

export onnx-model

optimum-cli export onnx --task text-generation --opset 17 --optimize O4 --fp16 --device cuda --model /home/storage00/D13_llama_7b_shortprompt D13_llama_7b_shortprompt_o1_fp16_opset17_onnx

2.run docker of tritonserver to inference

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/storage00/triton/models:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models --strict-model-config=false

3.Got an Error：

4.Print opset of exported onnx model opset of com.ms.internal.nhwc is 19!

Expected behavior

docker of tritonserver runs successfully with exported onnx model from optimum

Dec 01 '23 10:12 yangelaboy

hello?

Dec 07 '23 03:12 yangelaboy