llama_index OpenAI API "maximum context length" errors

Seeing a bunch of errors coming back from the OpenAI API:

This model's maximum context length is 4097 tokens, however you requested 4303 tokens (4047 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

I'm just doing

index = gpt_index.GPTTreeIndex(docs)

With a list of docs I created manually, which are just files.

I construct the docs like this:

gpt_index.Document(
  contents, 
  extra_info=dict(filename=f, path=path)
)

In [2]: gpt_index.__version__
Out[2]: '0.2.7'

Jan 18 '23 23:01 mariusnita

thanks @mariusnita, sorry about the issue. i'll take a look soon. Is this only with the tree index? Does it work for simple vector index?

Jan 18 '23 23:01 jerryjliu

I just tried, and I don't get any errors with the vector or list indexes.

BTW, I just noticed the vector index is 20x cheaper and faster to create, and seems to have much better question-answering performance than the tree index. (Although I was only able to create a partial tree index by discarding the failing chunks, so that may explain the poor performance.)

Jan 19 '23 02:01 mariusnita

BTW, I just noticed the vector index is 20x cheaper and faster to create, and seems to have much better question-answering performance than the tree index. (Although I was only able to create a partial tree index by discarding the failing chunks, so that may explain the poor performance.)

Yeah it's a fair point, that's why the SimpleVectorIndex is the default mode in the quickstart :)

i've found the tree index to be more effective at 1) summarization (through construction of the tree itself), and decently ok at 2) routing, though of course embeddings can be used for (2) as well

Jan 19 '23 02:01 jerryjliu

Seeing the same error when using the code-davinci-002 model with the vector index:

llm_predictor = gpt_index.LLMPredictor(
    llm=langchain.OpenAI(
        temperature=0,
        model_name="code-davinci-002"
    )
)
index = gpt_index.GPTSimpleVectorIndex(
    docs,
    llm_predictor=llm_predictor
)

openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 9549 tokens (9549 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

Jan 19 '23 03:01 mariusnita

@mariusnita do you have sample data to help me repro by any chance? Feel free to DM me in the Discord

Jan 19 '23 03:01 jerryjliu

I had the same error with GPTSimpleVectorIndex and I was able to get around it by setting prompt_helper. for your information.

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4181 tokens (3925 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

max_input_size = 4096
num_output = 2000 
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

index = GPTSimpleVectorIndex.load_from_disk(
    'index.json', prompt_helper=prompt_helper
)

Jan 19 '23 06:01 stanakaj

Thanks @stanakaj. Yeah given the max input size this should be something gpt index handles under the hood, i'd be curious to see what the data is

Jan 19 '23 06:01 jerryjliu

@jerryjliu This is likely a bad example because it's probably not useful to feed SVGs into gpt-index; nonetheless this causes gpt-index to crash:

https://www.roojs.org/roojs1/fonts/nunito/nunito-v16-latin-italic.svg

Example program:

filename = "nunito-v16-latin-italic.svg"

with open(filename) as f:
    contents = f.read()

docs = [gpt_index.Document(contents)]

llm_predictor = gpt_index.LLMPredictor(
    llm=langchain.OpenAI(temperature=0, model_name="code-davinci-002")
)
index = gpt_index.GPTSimpleVectorIndex(docs, llm_predictor=llm_predictor)

Jan 19 '23 07:01 mariusnita

The same file causes GPTTreeIndex to fail:

filename = "nunito-v16-latin-italic.svg"

with open(filename) as f:
    contents = f.read()

index = gpt_index.GPTTreeIndex(
    [gpt_index.Document(contents)],
)

Jan 19 '23 07:01 mariusnita

thanks @mariusnita, taking a look now

Jan 20 '23 01:01 jerryjliu

Hi @mariusnita, just a quick note. The PR I linked partially fixes the issue but does not completely fix the issue around your use case. This is because at the moment, there's no way to appropriately pre-compute # tokens used for text-embedding-ada-002 (the tokenizer i use is not aligned with the tokenizer in openai): https://help.openai.com/en/articles/6824809-embeddings-frequently-asked-questions.

In the meantime, for your specific use case, can you manually set chunk_size_limit=4096 (a smaller number)? e.g.

index = GPTSimpleVectorIndex(docs, llm_predictor=llm_predictor, chunk_size_limit=4096)

Jan 20 '23 06:01 jerryjliu

Thank you. #266 fixes also my reported error.

Jan 23 '23 23:01 stanakaj

llama_index llama_index copied to clipboard

OpenAI API "maximum context length" errors

llama_index
llama_index copied to clipboard