langchain
langchain copied to clipboard
When using embedding, the Chinese reply will be incomplete
System Info
Mac vs code python :Python 3.10.11
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [X] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [X] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
- embedding some text into Chroma
- query and run load_qa_chain with OpenAI
docs = docsearch.similarity_search(query="some txt",k=2)
llm = OpenAI(
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
temperature=0.1)
chain = load_qa_chain(llm=llm,chain_type="stuff",verbose=True)
result = chain.run(input_documents=docs,question=query,return_only_outputs=True)
- The result in Chinese keeps 127 ~131 words, English will finish the whole sentence.
example:
我们***是一家专注于*****机构,近些年来,我们的学员人数突破****,遍布全国***个城市,海外**个国家,这自然是我们家长对于****最好的认可。我们深知宝贝一开始有兴趣,后来因为各种的枯燥变得不愿意学了,因此,我们采用三方合作配合的模式,即家长
我们***是一家专注于*****机构,近些年来,我们的学员人数突破****,遍布全国***个城市,海外**个国家,这自然是我们家长对于***最好的认可。我们深知宝贝一开始有兴趣,后来因为各种的枯燥变得不愿意学了的顾虑,因此我们采用了一种科学的学习模式
Expected behavior
I think this was posted while working on characters, looking forward to a fix.
Same issue here
Found the solution, you can add a max_tokens
parameters when initializing OpenAI like this:
llm = OpenAI(temperature=0, max_tokens=2048)
tks, I upgrade langchain ,and it's gone..
你还是把日志打印出来,看看检索到的2个片段是否是你要的答案,另外文档的分割也很重要。调整你文档的段落,重新向量化以后看看