intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

Inference ONNX format model

Open bmtuan opened this issue 2 years ago • 5 comments
trafficstars

Hi guys, Are you planning to inference onnx format model soon?

bmtuan avatar Oct 13 '23 02:10 bmtuan

We already support onnx, please refer to: https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/deprecated/docs/deploy_and_integration.md

kevinintel avatar Oct 15 '23 12:10 kevinintel

if you don't have other questions, i will close this issue.

kevinintel avatar Nov 16 '23 12:11 kevinintel

Hi bmtuan, can you use the example?

kevinintel avatar Nov 24 '23 09:11 kevinintel

C:\Windows\system32>D:\o\1\run_whisper.exe -l zh -m D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx -f D:\o\1\1.wav -osrt whisper_init_from_file_no_state: loading model from 'D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx' whisper_model_load: loading model NE_ASSERT: E:\whisper_opt\intel_extension_for_transformers\llm\runtime\graph\core\ne_layers.c:643: wtype != NE_TYPE_COUNT

dyt06 avatar Jan 02 '24 01:01 dyt06

P

C:\Windows\system32>D:\o\1\run_whisper.exe -l zh -m D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx -f D:\o\1\1.wav -osrt whisper_init_from_file_no_state: loading model from 'D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx' whisper_model_load: loading model NE_ASSERT: E:\whisper_opt\intel_extension_for_transformers\llm\runtime\graph\core\ne_layers.c:643: wtype != NE_TYPE_COUNT

This method uses the cpp model for inference, and the onnx model is not supported for the time being, if you want to use the onnx model for inference, please refer to :https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/deprecated/docs/deploy_and_integration.md

LJ-underdog avatar Jan 02 '24 02:01 LJ-underdog