localGPT
localGPT copied to clipboard
How could we add the streaming support to enhance the output effect?
How could we add the streaming support to enhance the output effect?
I completely agree. The user experience is not satisfactory when the response is returned all at once. Is it possible to support a streaming approach where we can see the token output? It would greatly enhance the experience. It seems that the native Llama2 can achieve this, but I'm not sure how localGPT based on user data implements it.
It's easy, just use callback_manager as a parameter for LlamaCPP and not in RetrievalQA and make llm verbose
See : https://github.com/Rufus31415/local-documents-gpt/blob/1f48f6884ae1d1fe474d6aa8e44b83480dfca30b/chat.py#L74C7-L74C7
kwargs = {
#####
"verbose": True
#####
}
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(callback_manager=callback_manager, **kwargs)
llm.streaming= True
Would it be possible with Mistral ? @Rufus31415
Would it be possible with Mistral ? @Rufus31415
Yes, try this : https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
@Rufus31415 , will it hold for streamlit?
Would it be possible with Mistral ? @Rufus31415
Yes, try this : https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
Hello When I run chat.py,I met a problem like this:
What can I do ?
@Rufus31415, sorry to revive this old thread, but I was wondering the exact same thing. "How to implement streaming" I looked at the code snippet you provided and current source code, but I don't see how where this would go. Can you provide any further details?
sorry, I don't develop this repo. Please contact the localGPT developer directly. In the previous comment, I just said that I had played with LlamaCpp and I explain how to simply add streaming in the standard output.