Alpaca-LoRA-Serve
Alpaca-LoRA-Serve copied to clipboard
Streaming response
Hi,
thank you very much for this work. Do you plan to support streaming response any time soon like text-generation-webui does ?
Best Alexander
I am planning to add streaming feature soon! :) in that case batch requests handling feature will be disabled (well will remain as an option)
Can you elaborate on how you would implement it ? Maybe I can help. From what I understand it helps reducing the latency in a way such that the user wont have to wait until the full response is returned.
Thanks @alexanderfrey
Hugging Face library does not support streaming feature out of the box, so I thought I need a sort of money patch. Luckily, I found one from different open source project : https://github.com/hyperonym/basaran/issues/57
if it turns out hard for me to implement streaming feature in the current version, I will let you know!
check this out : https://twitter.com/algo_diver/status/1638079375085305856?s=20
just updated the repository and experimentally running here : https://notebookse.jarvislabs.ai/BuOu_VbEuUHb09VEVHhfnFq4-PMhBRVCcfHBRCOrq7c4O9GI4dIGoidvNf76UsRL
@alexanderfrey
in the streaming, most of the parameters in GenerationConfig
are not supported. Do you think you can make that happen?
let me have a look tonight. can not promise but definitely interested to contribute