langchain icon indicating copy to clipboard operation
langchain copied to clipboard

DeepLake Retrieval with gpt-3.5-turbo: maximum context length is 4097 tokens exceeded

Open hugo4711 opened this issue 1 year ago • 3 comments

I want to analyze my codebase with DeepLake.

unfortunately I must still use gpt-3.5-turbo. The token length is too long and I tried setting

max_tokens_limit reduce_k_below_max_tokens

without success to reduce tokens.

I always get:

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 21601 tokens. Please reduce the length of the messages.

This is the code I use:

db = DeepLake(dataset_path="hub://COMPANY/xyz", read_only=True, embedding_function=embeddings)

retriever = db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['fetch_k'] = 100
retriever.search_kwargs['maximal_marginal_relevance'] = True
retriever.search_kwargs['k'] = 20

retriever.search_kwargs['reduce_k_below_max_tokens'] = True
retriever.search_kwargs['max_tokens_limit'] = 3000

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

model = ChatOpenAI(model='gpt-3.5-turbo') # 'gpt-4',
qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)

questions = [
    "What 5 key improvements to that codebase would you suggest?",
    "How can we improve hot code relaod?"
] 
chat_history = []

for question in questions:  
    result = qa({"question": question, "chat_history": chat_history})
    chat_history.append((question, result['answer']))
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")`

hugo4711 avatar Apr 21 '23 22:04 hugo4711

You're inputting your entire codebase which is 20k+ tokens. That kind of defeats the point of the retriever. Try chunking the codebase. The retriever will then only input the parts of your codebase which are most relevant to your questions and will significantly reduce your token amount within the confines of the model.

Have a look at text splitters. There are examples how to split your documents in the Deep Lake notebook.

You may also want to use a chain for QA like the RetrievalQA chain. There's also an example in that notebook.

from langchain.chains import RetrievalQA
from langchain.llms import OpenAIChat

qa = RetrievalQA.from_chain_type(llm=OpenAIChat(model='gpt-3.5-turbo'), chain_type='stuff', retriever=retriever)

Play around with the 'stuff' chain type and try 'map_reduce' or 'refine'. The Stuff Chain will stuff all the documents found by the retriever into the model and could increase the tokens sent to the model. Learn more about chain types.

account00001 avatar Apr 23 '23 01:04 account00001

@account00001 Thanks for the helpful advice!

hugo4711 avatar Apr 23 '23 17:04 hugo4711

This issue occurs when running the example in the docs here: https://python.langchain.com/en/latest/use_cases/question_answering.html

jaded0 avatar Apr 25 '23 23:04 jaded0

Hi, @hugo4711! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is that the maximum context length of the gpt-3.5-turbo model is exceeded when inputting a codebase. One user suggested chunking the codebase and using a chain for QA like the RetrievalQA chain, and you thanked them for the advice. Another user mentioned that the issue occurs when running the example in the documentation.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dosubot[bot] avatar Sep 17 '23 17:09 dosubot[bot]