localGPT How could we add the streaming support to enhance the output effect?

How could we add the streaming support to enhance the output effect?

Open Zephyruswind opened this issue 2 years ago • 8 comments

How could we add the streaming support to enhance the output effect?

Oct 01 '23 03:10 Zephyruswind

I completely agree. The user experience is not satisfactory when the response is returned all at once. Is it possible to support a streaming approach where we can see the token output? It would greatly enhance the experience. It seems that the native Llama2 can achieve this, but I'm not sure how localGPT based on user data implements it.

Oct 06 '23 08:10 wangchao-sh

It's easy, just use callback_manager as a parameter for LlamaCPP and not in RetrievalQA and make llm verbose

See : https://github.com/Rufus31415/local-documents-gpt/blob/1f48f6884ae1d1fe474d6aa8e44b83480dfca30b/chat.py#L74C7-L74C7

kwargs = {
#####
    "verbose": True
#####
}
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(callback_manager=callback_manager, **kwargs)
llm.streaming= True

Nov 07 '23 09:11 Rufus31415

Would it be possible with Mistral ? @Rufus31415

Jan 15 '24 18:01 DrekoDev

Would it be possible with Mistral ? @Rufus31415

Yes, try this : https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

Jan 22 '24 06:01 Rufus31415

@Rufus31415 , will it hold for streamlit?

Jan 23 '24 06:01 Satyam7166-tech

Would it be possible with Mistral ? @Rufus31415

Yes, try this : https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

Hello When I run chat.py,I met a problem like this: 1718118360640

What can I do ?

Jun 11 '24 15:06 Suiji12

@Rufus31415, sorry to revive this old thread, but I was wondering the exact same thing. "How to implement streaming" I looked at the code snippet you provided and current source code, but I don't see how where this would go. Can you provide any further details?

Jul 02 '24 15:07 sorhtyre

sorry, I don't develop this repo. Please contact the localGPT developer directly. In the previous comment, I just said that I had played with LlamaCpp and I explain how to simply add streaming in the standard output.

Jul 03 '24 13:07 Rufus31415

localGPT localGPT copied to clipboard

How could we add the streaming support to enhance the output effect?

localGPT
localGPT copied to clipboard