langchain icon indicating copy to clipboard operation
langchain copied to clipboard

About map_reduce.py

Open llmadd opened this issue 1 year ago • 2 comments

System Info

When I'm using "gpt-3.5-turbo-16k" model,This model supports 16k token.However, using the mp-reduce algorithm, if the answer obtained at A time exceeds 4000 tokens, this will be reported. "A single document was longer than the context length,we cannot handle this." Error. image I don't think the token_max parameter changes for different models

Who can help?

@hwchase17 @agola11 Hope to get help, this is very troublesome for my use

Information

  • [ ] The official example notebooks/scripts
  • [X] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [X] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

My code: chain_one = load_summarize_chain(chat, chain_type="map_reduce",return_intermediate_steps=True,verbose=True, map_prompt=PROMPT, combine_prompt=combine_prompt) x = chain_one({"input_documents": documents},return_only_outputs=True)

documents chunk_size is 4000 tokens

ERROR:File D:\conda\envs\ai\Lib\site-packages\langchain\chains\combine_documents\map_reduce.py:37, in _split_list_of_docs(docs, length_func, token_max, **kwargs) 32 raise ValueError( 33 "A single document was longer than the context length," 34 " we cannot handle this." 35 ) 36 if len(_sub_result_docs) == 2: ---> 37 raise ValueError( 38 "A single document was so long it could not be combined " 39 "with another document, we cannot handle this." 40 ) 41 new_result_doc_list.append(_sub_result_docs[:-1]) 42 _sub_result_docs = _sub_result_docs[-1:]

ValueError: A single document was so long it could not be combined with another document, we cannot handle this.

Expected behavior

I hope that when I use the large token model, these errors will not occur

llmadd avatar Jun 15 '23 01:06 llmadd

image I tried to add a new model to openai under langchain framework

llmadd avatar Jun 15 '23 02:06 llmadd

image I temporarily solved the problem by directly modifying the value, hoping for a better solution

llmadd avatar Jun 15 '23 05:06 llmadd

@hwchase17 @agola11 does anyone on the langchain team, has a better solution than changing the source code of the LangChain library ? Because if you deploy your langchain app, you cannot change the source code. thank you

SinaArdehali avatar Jul 02 '23 01:07 SinaArdehali

@SinaArdehali I have contacted the langchain team, they are working on some of the fixes around this issue.

ShantanuNair avatar Jul 03 '23 11:07 ShantanuNair

@SinaArdehali I have contacted the langchain team, they are working on some of the fixes around this issue.

Good news! thank you

llmadd avatar Jul 03 '23 11:07 llmadd

@Ooho1997 @SinaArdehali My discussion with them is more about the mapreduce implementation itself. If you want to be able to set token_max here is how you can do that :)

res = await chain(inputs={'input_documents': texts, 'token_max': 12000}, return_only_outputs=True)

ShantanuNair avatar Jul 05 '23 07:07 ShantanuNair

@Ooho1997 @SinaArdehali My discussion with them is more about the mapreduce implementation itself. If you want to be able to set token_max here is how you can do that :)我与他们的讨论更多是关于mapreduce 实现本身。如果您希望能够设置 token_max 这里是您可以做到的:)

res = await chain(inputs={'input_documents': texts, 'token_max': 12000}, return_only_outputs=True)

thanks, Now I know my problem

llmadd avatar Jul 05 '23 07:07 llmadd

@Ooho1997 Awesome! If you don't mind please close your related issues solved by my comment, it would help keep it clean for those of us tracking related changes around the mapreduce / token_max implementation.

Edit: Harrison added a PR where you can now set the token_max during chain initialization for both a Reduce Chain as well as for load_summarize_chain

ShantanuNair avatar Jul 05 '23 07:07 ShantanuNair