langchain
langchain copied to clipboard
RetrievalQAWithSourcesChain causing openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens
I want to migrate from VectorDBQAWithSourcesChain to RetrievalQAWithSourcesChain. The sample code use Qdrant vector store, it work fine with VectorDBQAWithSourcesChain.
When I run the code with RetrievalQAWithSourcesChain changes, it prompt me the following error:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4411 tokens (4155 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
The following is the git diff of the code:
diff --git a/ask_question.py b/ask_question.py
index eac37ce..e76e7c5 100644
--- a/ask_question.py
+++ b/ask_question.py
@@ -2,7 +2,7 @@ import argparse
import os
from langchain import OpenAI
-from langchain.chains import VectorDBQAWithSourcesChain
+from langchain.chains import RetrievalQAWithSourcesChain
from langchain.vectorstores import Qdrant
from langchain.embeddings import OpenAIEmbeddings
from qdrant_client import QdrantClient
@@ -14,8 +14,7 @@ args = parser.parse_args()
url = os.environ.get("QDRANT_URL")
api_key = os.environ.get("QDRANT_API_KEY")
qdrant = Qdrant(QdrantClient(url=url, api_key=api_key), "docs_flutter_dev", embedding_function=OpenAIEmbeddings().embed_query)
-chain = VectorDBQAWithSourcesChain.from_llm(
- llm=OpenAI(temperature=0, verbose=True), vectorstore=qdrant, verbose=True)
+chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="stuff", retriever=qdrant.as_retriever())
result = chain({"question": args.question})
print(f"Answer: {result['answer']}")
If you need the code of data ingestion (create embeddings), please check it out: https://github.com/limcheekin/flutter-gpt/blob/openai-qdrant/create_embeddings.py
Any idea how to fix it?
Thank you.
same issue here
I am also looking for a way to limit the amount of data being sent to an external source. Ideally, I would prefer to not make any changes to the existing code.
However, I believe the best solution would be to implement a limiter to the classes that interact with OpenAI's API. This is because we are aware of the limits and can truncate text before sending it. We could also display a warning message in the console.
https://github.com/hwchase17/langchain/issues/2140
I met the same problem when I want to query in some chinese documents which I put them into chromadb
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Chroma.from_documents(texts, embeddings)
retriever = docsearch.as_retriever()
qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=retriever)
query = "what"
qa.run(query)
You can see my prompt is very short, but it says: InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 6990 tokens (6734 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
You can try setting reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.
You can try setting
reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.
That's works! Please review the latest code at https://github.com/limcheekin/flutter-gpt/blob/openai/ask_question.py, appreciate if you spot any improvement.
Thanks.
You can try setting
reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.
this a good job,but i dont know how to set reduce_k_below_max_tokens=True?Can you give me some examples
Please see my code example at https://github.com/limcheekin/flutter-gpt/blob/openai/ask_question.py#L42
Let's see if it works for you.
Thanks, @limcheekin , I saw your code, but in your code you use RetrievalQAWithSourcesChain, which has reduce_k_below_max_tokens field, as following reference said. https://python.langchain.com/en/latest/reference/modules/chains.html#langchain.chains.RetrievalQAWithSourcesChain
But I use RetrievalQA, it did not has reduce_k_below_max_tokens field, please check this reference https://python.langchain.com/en/latest/reference/modules/chains.html#langchain.chains.RetrievalQA
should I change to RetrievalQAWithSourcesChain? what's the different between them?
Thanks!
There is a more fundamental fix for this issue: retriever=VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"filter":{"type":"filter"},"k":3},),
I guess it will work for qdrant as well
Thanks, @limcheekin , I saw your code, but in your code you use RetrievalQAWithSourcesChain, which has reduce_k_below_max_tokens field, as following reference said. https://python.langchain.com/en/latest/reference/modules/chains.html#langchain.chains.RetrievalQAWithSourcesChain
But I use RetrievalQA, it did not has reduce_k_below_max_tokens field, please check this reference https://python.langchain.com/en/latest/reference/modules/chains.html#langchain.chains.RetrievalQA
should I change to RetrievalQAWithSourcesChain? what's the different between them?
Thanks!
我也有这个问题,我用的是openai.ChatCompletion
@wen020
There seems to be max_tokens_limit parameter, but that is not reflected in the documentation.
This fixed my issue:
ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(search_kwargs={"k": 1}), max_tokens_limit=4097)
same issue here
This does not work with RetrievalQA, it only works with RetrievalQAWithSourcesChain
@ZiyadMoraished The solution works only for one type of chain ConversationalRetrievalChain, for others its not working
First of all, on data preparation, how you split the text into chunks is important, please look into the following video and notebook to understand more: https://www.youtube.com/watch?v=eqOfr4AGLk8 https://github.com/pinecone-io/examples/blob/master/generation/langchain/handbook/xx-langchain-chunking.ipynb
Also, the recent blog post from LangChain official blog: https://blog.langchain.dev/improving-document-retrieval-with-contextual-compression/
These two solutions should solve the issue, I will close the issue if it is quiet for a week.
Thanks.
Hi, trying to implement the suggested solutions myself right now; where would one add these parameters? Thanks!
Still have that same issue with RetrievalQA.from_chain_type
Please read the issue from start to finish, there are few solutions being mentioned. Try it out one by one, it should fix the issue. Otherwise, please open a new issue for the problem you have and share the Github repo so that it is ease for other to re-produce the problem and help you.
The issue is dated. I think most people here got their solution or workaround for the issue, I will close the issue for now.
I am not sure :)
I switched to this which seemed to work:
chain = VectorDBQAWithSourcesChain.from_llm( llm=OpenAI(temperature=0, verbose=True), vectorstore=store, verbose=True) result = chain({"question": args.question})
Which got me:
UserWarning: VectorDBQAWithSourcesChain is deprecated - please use from langchain.chains import RetrievalQAWithSourcesChain
warnings.warn(
And I modified to this: qa= RetrievalQAWithSourcesChain.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True
)
And I am back at the same error ...
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5541 tokens (5285 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
You can try setting
reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.That's works! Please review the latest code at https://github.com/limcheekin/flutter-gpt/blob/openai/ask_question.py, appreciate if you spot any improvement.
Thanks.
I'm new to this, where should I add this code?
Thanks
same question
The following notebook should be helpful: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
If none of the solutions above fixed your problem, most likely that the issue is caused by the data, so make sure you understand your data preparation process.
if you are using an agent without chains but have a vector store(pinecone in my case) as a retrieval tool, where would you limit either max_tokens or top_k?
The following answer is come from above:
vectorstore.as_retriever(search_kwargs={"k": 1})
I met the same problem when I want to query in some chinese documents which I put them into chromadb
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY']) docsearch = Chroma.from_documents(texts, embeddings) retriever = docsearch.as_retriever() qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=retriever) query = "what" qa.run(query)You can see my prompt is very short, but it says: InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 6990 tokens (6734 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
Same error, Any updates here?
same error for me, since it is the limit of openAI, if it the output length is too large, I think we can set max_tokens to limit the output length, but if it is the input context length is too large, I don't know how to do by using some parameters, what I can think of is: reduce the context input, or use compression (I still didn't try it), I will try to reduce my input context size.
第2140章
当我想查询一些中文文档并将其放入 chromadb 时,我遇到了同样的问题
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY']) docsearch = Chroma.from_documents(texts, embeddings) retriever = docsearch.as_retriever() qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=retriever) query = "what" qa.run(query)您可以看到我的提示非常短,但它显示: InvalidRequestError: This model's max context length is 4097 tokens, but you requests 6990 tokens (提示中为 6734;完成时为 256)。请减少提示;或完成长度。
How did you solve it??
@GeneralLHW , I think the error is saying: your context is too long, the context is from the chromadb, so it is the document in the chromadb is too long, I think. You can set verbose=true, to see the complete question and response to OpenAI.
@saaspeter ,
llm=ChatOpenAI(max_tokens=4096, model_name='gpt-3.5-turbo',temperature=0, verbose=True)
qa = RetrievalQA.from_chain_type(llm, chain_type="map_rerank", retriever=docsearch.as_retriever(), verbose=True)
result = qa.run('1+1')
I added parameters, but I still haven't seen the complete question and reply, what should I do?
@GeneralLHW , try to add this: langchain.debug = True in your python file ? for me, by doing this, I can see the full call parameters and response.