llama_index Chunk size sometimes exceeds max model size

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4229 tokens (3973 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Looks like sometimes chunks exceed max size. I've noticed that this sometimes happens when using non-english alphabets. Full error message:

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?d85a90bd-3a79-42d6-a7de-ba33c0829b8c)
---------------------------------------------------------------------------
InvalidRequestError                       Traceback (most recent call last)
Cell In[17], line 2
      1 index = GPTListIndex([Document(content)])
----> 2 response = index.query(summary_simple)
      3 print(response)

File ~/.pyenv/versions/3.9.1/lib/python3.9/site-packages/gpt_index/indices/base.py:342, in BaseGPTIndex.query(self, query_str, verbose, mode, **query_kwargs)
    328 query_config = QueryConfig(
    329     index_struct_type=IndexStructType.from_index_struct(self._index_struct),
    330     query_mode=mode_enum,
    331     query_kwargs=query_kwargs,
    332 )
    333 query_runner = QueryRunner(
    334     self._llm_predictor,
    335     self._prompt_helper,
   (...)
    340     recursive=False,
    341 )
--> 342 return query_runner.query(query_str, self._index_struct)

File ~/.pyenv/versions/3.9.1/lib/python3.9/site-packages/gpt_index/indices/query/query_runner.py:111, in QueryRunner.query(self, query_str, index_struct)
    103 query_kwargs = self._get_query_kwargs(config)
    104 query_obj = query_cls(
    105     index_struct,
...
    681         rbody, rcode, resp.data, rheaders, stream_error=stream_error
    682     )
    683 return resp

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4229 tokens (3973 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

My code is pretty simple:

index = GPTListIndex([Document(content)])
response = index.query(summary_simple)
print(response)

and the text is around 9k tokens.

Jan 24 '23 14:01 not-poma

@not-poma which version of gpt index are you on? i was hoping to have fixed this issue

Jan 27 '23 22:01 jerryjliu

my version is 0.2.9 (also reproduces on 0.2.16)

Jan 28 '23 07:01 not-poma

Steps to reproduce:

# pip install newspaper3k gpt_index
from newspaper import Article
from gpt_index import GPTListIndex, Document

summary_detailed = "Write a summary of the following. Try to use only the information provided. Try to include as many key details as possible."
article = Article('https://ru.wikipedia.org/wiki/GPT-3')
article.download()
article.parse()
content = f'{article.title}\n{article.text}'
index = GPTListIndex([Document(content)])
response = index.query(summary_detailed)
print(response)

Jan 28 '23 08:01 not-poma

thanks for posting the steps @not-poma - i'll take a look soon

Jan 28 '23 08:01 jerryjliu

@not-poma: could you upgrade to 0.3.3 and let me know if this is still occurring?

Feb 03 '23 03:02 jerryjliu

This still reproduces on exactly the same example, but the error is much smaller: InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4105 tokens (3849 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Feb 06 '23 11:02 not-poma

Getting this as well... super thin margin:

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, 
however you requested 4101 tokens (3845 in your prompt; 256 for the completion). 
Please reduce your prompt; or completion length.

Feb 17 '23 04:02 dir

I have also faced this issue in a few non-English languages.

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4103 tokens (3079 in your prompt; 1024 for the completion). Please reduce your prompt; or completion length.

Feb 17 '23 14:02 ibnesayeed

I'm getting this as well, (non English alphabet)

Feb 21 '23 13:02 raunov

going to investigate today! note: seems to be an issue where there aren't a lot of newlines (i thought we had a fallback for this but apparently it's not working)

Mar 06 '23 21:03 jerryjliu

Me too. For example a document with Ä/ä (a with dots), Ö/ö (o with dots), Å/å (Swedish o) for example seem to cause the issue.

Mar 09 '23 18:03 testost

Have a similar issue. I'd like very long outputs from ChatGPT, but don't really know how many tokens I can work with. Tried implementing my own token counting with tiktoken but still problematic because I don't know how many tokens Llama Index is generating under the hood. Should I be using something besides GPTSimpleVectorIndex?

EDIT: Ah, I'm referring to something different... I'm referring to setting the max_tokens manually.

Mar 22 '23 05:03 handrew

Any chance this issue could get the priority to get fixed soon? Or provide a way to skip the token estimation, this is really a show stopper to me :(

Mar 24 '23 07:03 chrischjh

Hey folks, as a work around, I think you can explicitly set a lower chunk size for now. In the 0.5.0 API, this would look like

service_context = ServiceContext.from_defaults(chunk_size_limit=3000)
index = GPTListIndex.from_documents(documents, service_context)

Mar 29 '23 05:03 Disiok

@not-poma is this still an issue in the latest versions of llama-index? If not, can you re-open this issue? 👍🏻

Jun 05 '23 15:06 logan-markewich

llama_index llama_index copied to clipboard

Chunk size sometimes exceeds max model size

llama_index
llama_index copied to clipboard