langchain When using embedding, the Chinese reply will be incomplete

When using embedding, the Chinese reply will be incomplete

Open lingfengchencn opened this issue 1 year ago • 2 comments

System Info

Mac vs code python :Python 3.10.11

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[X] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[X] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

embedding some text into Chroma
query and run load_qa_chain with OpenAI

docs = docsearch.similarity_search(query="some txt",k=2)
llm = OpenAI(
        streaming=True, 
        callbacks=[StreamingStdOutCallbackHandler()],
        temperature=0.1)
chain = load_qa_chain(llm=llm,chain_type="stuff",verbose=True)
result = chain.run(input_documents=docs,question=query,return_only_outputs=True)

The result in Chinese keeps 127 ~131 words, English will finish the whole sentence.

example:

我们***是一家专注于*****机构，近些年来，我们的学员人数突破****，遍布全国***个城市，海外**个国家，这自然是我们家长对于****最好的认可。我们深知宝贝一开始有兴趣，后来因为各种的枯燥变得不愿意学了，因此，我们采用三方合作配合的模式，即家长

我们***是一家专注于*****机构，近些年来，我们的学员人数突破****，遍布全国***个城市，海外**个国家，这自然是我们家长对于***最好的认可。我们深知宝贝一开始有兴趣，后来因为各种的枯燥变得不愿意学了的顾虑，因此我们采用了一种科学的学习模式

Expected behavior

I think this was posted while working on characters, looking forward to a fix.

May 12 '23 11:05 lingfengchencn

Same issue here

May 22 '23 09:05 qazs

Found the solution, you can add a max_tokens parameters when initializing OpenAI like this:

llm = OpenAI(temperature=0, max_tokens=2048)

May 22 '23 14:05 qazs

tks， I upgrade langchain ,and it's gone..

May 23 '23 03:05 lingfengchencn

你还是把日志打印出来，看看检索到的2个片段是否是你要的答案，另外文档的分割也很重要。调整你文档的段落，重新向量化以后看看

May 24 '23 07:05 chujian

langchain langchain copied to clipboard

When using embedding, the Chinese reply will be incomplete

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard