langchain
langchain copied to clipboard
DeepLake Retrieval with gpt-3.5-turbo: maximum context length is 4097 tokens exceeded
I want to analyze my codebase with DeepLake.
unfortunately I must still use gpt-3.5-turbo. The token length is too long and I tried setting
max_tokens_limit reduce_k_below_max_tokens
without success to reduce tokens.
I always get:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 21601 tokens. Please reduce the length of the messages.
This is the code I use:
db = DeepLake(dataset_path="hub://COMPANY/xyz", read_only=True, embedding_function=embeddings)
retriever = db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['fetch_k'] = 100
retriever.search_kwargs['maximal_marginal_relevance'] = True
retriever.search_kwargs['k'] = 20
retriever.search_kwargs['reduce_k_below_max_tokens'] = True
retriever.search_kwargs['max_tokens_limit'] = 3000
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
model = ChatOpenAI(model='gpt-3.5-turbo') # 'gpt-4',
qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)
questions = [
"What 5 key improvements to that codebase would you suggest?",
"How can we improve hot code relaod?"
]
chat_history = []
for question in questions:
result = qa({"question": question, "chat_history": chat_history})
chat_history.append((question, result['answer']))
print(f"-> **Question**: {question} \n")
print(f"**Answer**: {result['answer']} \n")`
You're inputting your entire codebase which is 20k+ tokens. That kind of defeats the point of the retriever. Try chunking the codebase. The retriever will then only input the parts of your codebase which are most relevant to your questions and will significantly reduce your token amount within the confines of the model.
Have a look at text splitters. There are examples how to split your documents in the Deep Lake notebook.
You may also want to use a chain for QA like the RetrievalQA chain. There's also an example in that notebook.
from langchain.chains import RetrievalQA
from langchain.llms import OpenAIChat
qa = RetrievalQA.from_chain_type(llm=OpenAIChat(model='gpt-3.5-turbo'), chain_type='stuff', retriever=retriever)
Play around with the 'stuff' chain type and try 'map_reduce' or 'refine'. The Stuff Chain will stuff all the documents found by the retriever into the model and could increase the tokens sent to the model. Learn more about chain types.
@account00001 Thanks for the helpful advice!
This issue occurs when running the example in the docs here: https://python.langchain.com/en/latest/use_cases/question_answering.html
Hi, @hugo4711! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is that the maximum context length of the gpt-3.5-turbo model is exceeded when inputting a codebase. One user suggested chunking the codebase and using a chain for QA like the RetrievalQA chain, and you thanked them for the advice. Another user mentioned that the issue occurs when running the example in the documentation.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!