intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

feature request: support for TextIterationstreamer of HF

Open RachelShalom opened this issue 1 year ago • 4 comments

Hi I want to be able to stream the model not only to stdoutput. the current streamer: TextStreamer only works with the stdouput as I understands it. I tried using the TextIterationStreamer but the current code does not support it

here is a reference:

https://huggingface.co/docs/transformers/v4.36.1/en/internal/generation_utils#transformers.TextIteratorStreamer

https://github.com/huggingface/transformers/blob/fc5b7419d4c8121d8f1fa915504bcc353422559e/src/transformers/generation/streamers.py#L125

I think supporting it is important in order to be able to use web application. I am trying to demonstrate this intel mdels performances with streamlit but I can't stream..

RachelShalom avatar Jan 01 '24 12:01 RachelShalom

Thank to your feedback, we will support it.

kevinintel avatar Jan 03 '24 05:01 kevinintel

@RachelShalom Hi Rachel. I am also trying to demonstrate the inference speed of the llm on intel. Were you able to find any walk around or other method to stream the tokens?

AdityaKulshrestha avatar Feb 27 '24 18:02 AdityaKulshrestha

@AdityaKulshrestha I assume there are servings options. @kevinintel did you guys decided to work on it?

RachelShalom avatar Mar 03 '24 08:03 RachelShalom

Yes, we will support it recently. After the fature enabled, I will upsate in this issue

kevinintel avatar Mar 04 '24 03:03 kevinintel