infinity
infinity copied to clipboard
when use engine optimum device tensorrt,startup fail
System Info
infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32 OS: linux model_base PEG nvidia-smi: cuda version 11.8, tensorrt: 8.6.1
Information
- [ ] Docker
- [X] The CLI directly via pip
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
1、just startup
Expected behavior
python3.10/dist-packages/optimum/onnxruntime/model_ort.py line 1444, in forward model_outputs = self.__prepare_onnx_outputs(use_torch, **onnx_outputs) python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py line 939 in __prepare_onnx_outputs model_outputs[output_name]=onnx_outputs[idx] IndexError: tuple index out of range
then i print log with model run inputs and outputs , find warmup model , first inference is ok , twice is error if i startup with --no-model-warmup, server can startup , but twice inference also error