whm233 comments

Results 4 comments of


                                            whm233

Issue: Does RetrievalQA Support Streaming Replies?

how to use this streaming with custom pre-trained models?

Issue: Does RetrievalQA Support Streaming Replies?

> 是的，我们可以这样做：这是实现这一点的简单代码： > > ``` > > llm = ChatOpenAI(temperature=0.2, > callbacks=[callback_handler], > streaming=True) > chain = ConversationalRetrievalChain.from_llm( > llm= llm, > retriever= retriever, > memory=memory, > chain_type='stuff', > combine_docs_chain_kwargs...

[BUG] <title>请问如何进行流失输出

> 如果是代码直接调用Qwen模型的话 > > ```python > from transformers import AutoModelForCausalLM, AutoTokenizer > > tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) > model = AutoModelForCausalLM.from_pretrained( > "Qwen/Qwen-7B-Chat", > device_map="auto", > trust_remote_code=True > ).eval() >...

[BUG] <title>请问如何进行流失输出

> LangChain的话可以用FastChat+vLLM部署个API（参考README的部署部分），网上的方案就适用了。 ok谢谢大佬指点，我还看到了有个openai_api，那个可以吗，因为FastChat+vLLM部署起来似乎有点问题