langchain About map

System Info

When I'm using "gpt-3.5-turbo-16k" model,This model supports 16k token.However, using the mp-reduce algorithm, if the answer obtained at A time exceeds 4000 tokens, this will be reported. "A single document was longer than the context length,we cannot handle this." Error. I don't think the token_max parameter changes for different models

Who can help?

@hwchase17 @agola11 Hope to get help, this is very troublesome for my use

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

My code: chain_one = load_summarize_chain(chat, chain_type="map_reduce",return_intermediate_steps=True,verbose=True, map_prompt=PROMPT, combine_prompt=combine_prompt) x = chain_one({"input_documents": documents},return_only_outputs=True)

documents chunk_size is 4000 tokens

ERROR:File D:\conda\envs\ai\Lib\site-packages\langchain\chains\combine_documents\map_reduce.py:37, in _split_list_of_docs(docs, length_func, token_max, **kwargs) 32 raise ValueError( 33 "A single document was longer than the context length," 34 " we cannot handle this." 35 ) 36 if len(_sub_result_docs) == 2: ---> 37 raise ValueError( 38 "A single document was so long it could not be combined " 39 "with another document, we cannot handle this." 40 ) 41 new_result_doc_list.append(_sub_result_docs[:-1]) 42 _sub_result_docs = _sub_result_docs[-1:]

ValueError: A single document was so long it could not be combined with another document, we cannot handle this.

Expected behavior

I hope that when I use the large token model, these errors will not occur

Jun 15 '23 01:06 llmadd

I tried to add a new model to openai under langchain framework

Jun 15 '23 02:06 llmadd

I temporarily solved the problem by directly modifying the value, hoping for a better solution

Jun 15 '23 05:06 llmadd

@hwchase17 @agola11 does anyone on the langchain team, has a better solution than changing the source code of the LangChain library ? Because if you deploy your langchain app, you cannot change the source code. thank you

Jul 02 '23 01:07 SinaArdehali

@SinaArdehali I have contacted the langchain team, they are working on some of the fixes around this issue.

Jul 03 '23 11:07 ShantanuNair

@SinaArdehali I have contacted the langchain team, they are working on some of the fixes around this issue.

Good news！ thank you

Jul 03 '23 11:07 llmadd

@Ooho1997 @SinaArdehali My discussion with them is more about the mapreduce implementation itself. If you want to be able to set token_max here is how you can do that :)

res = await chain(inputs={'input_documents': texts, 'token_max': 12000}, return_only_outputs=True)

Jul 05 '23 07:07 ShantanuNair

@Ooho1997 @SinaArdehali My discussion with them is more about the mapreduce implementation itself. If you want to be able to set token_max here is how you can do that :)我与他们的讨论更多是关于mapreduce 实现本身。如果您希望能够设置 token_max 这里是您可以做到的:)
res = await chain(inputs={'input_documents': texts, 'token_max': 12000}, return_only_outputs=True)

thanks, Now I know my problem

Jul 05 '23 07:07 llmadd

@Ooho1997 Awesome! If you don't mind please close your related issues solved by my comment, it would help keep it clean for those of us tracking related changes around the mapreduce / token_max implementation.

Edit: Harrison added a PR where you can now set the token_max during chain initialization for both a Reduce Chain as well as for load_summarize_chain

Jul 05 '23 07:07 ShantanuNair

langchain
langchain copied to clipboard

About map_reduce.py

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain langchain copied to clipboard

About map_reduce.py

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard