ipex-llm
ipex-llm copied to clipboard
How to stream output with qwen2.5 omni
I hope the output as token by token during the inference rather than the whole output after the inference finished, if ipex-llm support or not?
Hi @ZhangWei125521
Both transformers and vLLM/ollama support stream output.
For the transformers API, you can refer to https://huggingface.co/blog/aifeifei798/transformers-streaming-output and https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/Applications/streaming-llm .
For vLLM or ollama, it is supported initially in our examples https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/vLLM_quickstart.md