How to stream output with qwen2.5 omni

Open ZhangWei125521 opened this issue 6 months ago • 1 comments

I hope the output as token by token during the inference rather than the whole output after the inference finished, if ipex-llm support or not?

Jun 27 '25 03:06 ZhangWei125521

Hi @ZhangWei125521

Both transformers and vLLM/ollama support stream output.

For the transformers API, you can refer to https://huggingface.co/blog/aifeifei798/transformers-streaming-output and https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/Applications/streaming-llm .

For vLLM or ollama, it is supported initially in our examples https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/vLLM_quickstart.md

Jul 07 '25 03:07 qiyuangong