whm233
whm233
how to use this streaming with custom pre-trained models?
> 是的,我们可以这样做:这是实现这一点的简单代码: > > ``` > > llm = ChatOpenAI(temperature=0.2, > callbacks=[callback_handler], > streaming=True) > chain = ConversationalRetrievalChain.from_llm( > llm= llm, > retriever= retriever, > memory=memory, > chain_type='stuff', > combine_docs_chain_kwargs...
> 如果是代码直接调用Qwen模型的话 > > ```python > from transformers import AutoModelForCausalLM, AutoTokenizer > > tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) > model = AutoModelForCausalLM.from_pretrained( > "Qwen/Qwen-7B-Chat", > device_map="auto", > trust_remote_code=True > ).eval() >...
> LangChain的话可以用FastChat+vLLM部署个API(参考README的部署部分),网上的方案就适用了。 ok谢谢大佬指点,我还看到了有个openai_api,那个可以吗,因为FastChat+vLLM部署起来似乎有点问题