ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

How to stream output with qwen2.5 omni

Open ZhangWei125521 opened this issue 6 months ago • 1 comments

I hope the output as token by token during the inference rather than the whole output after the inference finished, if ipex-llm support or not?

ZhangWei125521 avatar Jun 27 '25 03:06 ZhangWei125521

Hi @ZhangWei125521

Both transformers and vLLM/ollama support stream output.

For the transformers API, you can refer to https://huggingface.co/blog/aifeifei798/transformers-streaming-output and https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/Applications/streaming-llm .

For vLLM or ollama, it is supported initially in our examples https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/vLLM_quickstart.md

qiyuangong avatar Jul 07 '25 03:07 qiyuangong