Alpaca-LoRA-Serve icon indicating copy to clipboard operation
Alpaca-LoRA-Serve copied to clipboard

Streaming response

Open alexanderfrey opened this issue 1 year ago • 7 comments

Hi,

thank you very much for this work. Do you plan to support streaming response any time soon like text-generation-webui does ?

Best Alexander

alexanderfrey avatar Mar 20 '23 20:03 alexanderfrey

I am planning to add streaming feature soon! :) in that case batch requests handling feature will be disabled (well will remain as an option)

deep-diver avatar Mar 21 '23 04:03 deep-diver

Can you elaborate on how you would implement it ? Maybe I can help. From what I understand it helps reducing the latency in a way such that the user wont have to wait until the full response is returned.

alexanderfrey avatar Mar 21 '23 06:03 alexanderfrey

Thanks @alexanderfrey

Hugging Face library does not support streaming feature out of the box, so I thought I need a sort of money patch. Luckily, I found one from different open source project : https://github.com/hyperonym/basaran/issues/57

if it turns out hard for me to implement streaming feature in the current version, I will let you know!

deep-diver avatar Mar 21 '23 07:03 deep-diver

check this out : https://twitter.com/algo_diver/status/1638079375085305856?s=20

deep-diver avatar Mar 21 '23 08:03 deep-diver

just updated the repository and experimentally running here : https://notebookse.jarvislabs.ai/BuOu_VbEuUHb09VEVHhfnFq4-PMhBRVCcfHBRCOrq7c4O9GI4dIGoidvNf76UsRL

deep-diver avatar Mar 22 '23 02:03 deep-diver

@alexanderfrey

in the streaming, most of the parameters in GenerationConfig are not supported. Do you think you can make that happen?

deep-diver avatar Mar 22 '23 03:03 deep-diver

let me have a look tonight. can not promise but definitely interested to contribute

alexanderfrey avatar Mar 22 '23 16:03 alexanderfrey