intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
feature request: support for TextIterationstreamer of HF
Hi I want to be able to stream the model not only to stdoutput. the current streamer: TextStreamer only works with the stdouput as I understands it. I tried using the TextIterationStreamer but the current code does not support it
here is a reference:
https://huggingface.co/docs/transformers/v4.36.1/en/internal/generation_utils#transformers.TextIteratorStreamer
https://github.com/huggingface/transformers/blob/fc5b7419d4c8121d8f1fa915504bcc353422559e/src/transformers/generation/streamers.py#L125
I think supporting it is important in order to be able to use web application. I am trying to demonstrate this intel mdels performances with streamlit but I can't stream..
Thank to your feedback, we will support it.
@RachelShalom Hi Rachel. I am also trying to demonstrate the inference speed of the llm on intel. Were you able to find any walk around or other method to stream the tokens?
@AdityaKulshrestha I assume there are servings options. @kevinintel did you guys decided to work on it?
Yes, we will support it recently. After the fature enabled, I will upsate in this issue