llama_index
llama_index copied to clipboard
Chunk size sometimes exceeds max model size
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4229 tokens (3973 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
Looks like sometimes chunks exceed max size. I've noticed that this sometimes happens when using non-english alphabets. Full error message:
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?d85a90bd-3a79-42d6-a7de-ba33c0829b8c)
---------------------------------------------------------------------------
InvalidRequestError Traceback (most recent call last)
Cell In[17], line 2
1 index = GPTListIndex([Document(content)])
----> 2 response = index.query(summary_simple)
3 print(response)
File ~/.pyenv/versions/3.9.1/lib/python3.9/site-packages/gpt_index/indices/base.py:342, in BaseGPTIndex.query(self, query_str, verbose, mode, **query_kwargs)
328 query_config = QueryConfig(
329 index_struct_type=IndexStructType.from_index_struct(self._index_struct),
330 query_mode=mode_enum,
331 query_kwargs=query_kwargs,
332 )
333 query_runner = QueryRunner(
334 self._llm_predictor,
335 self._prompt_helper,
(...)
340 recursive=False,
341 )
--> 342 return query_runner.query(query_str, self._index_struct)
File ~/.pyenv/versions/3.9.1/lib/python3.9/site-packages/gpt_index/indices/query/query_runner.py:111, in QueryRunner.query(self, query_str, index_struct)
103 query_kwargs = self._get_query_kwargs(config)
104 query_obj = query_cls(
105 index_struct,
...
681 rbody, rcode, resp.data, rheaders, stream_error=stream_error
682 )
683 return resp
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4229 tokens (3973 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
My code is pretty simple:
index = GPTListIndex([Document(content)])
response = index.query(summary_simple)
print(response)
and the text is around 9k tokens.
@not-poma which version of gpt index are you on? i was hoping to have fixed this issue
my version is 0.2.9
(also reproduces on 0.2.16
)
Steps to reproduce:
# pip install newspaper3k gpt_index
from newspaper import Article
from gpt_index import GPTListIndex, Document
summary_detailed = "Write a summary of the following. Try to use only the information provided. Try to include as many key details as possible."
article = Article('https://ru.wikipedia.org/wiki/GPT-3')
article.download()
article.parse()
content = f'{article.title}\n{article.text}'
index = GPTListIndex([Document(content)])
response = index.query(summary_detailed)
print(response)
thanks for posting the steps @not-poma - i'll take a look soon
@not-poma: could you upgrade to 0.3.3 and let me know if this is still occurring?
This still reproduces on exactly the same example, but the error is much smaller: InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4105 tokens (3849 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
Getting this as well... super thin margin:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens,
however you requested 4101 tokens (3845 in your prompt; 256 for the completion).
Please reduce your prompt; or completion length.
I have also faced this issue in a few non-English languages.
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4103 tokens (3079 in your prompt; 1024 for the completion). Please reduce your prompt; or completion length.
I'm getting this as well, (non English alphabet)
going to investigate today! note: seems to be an issue where there aren't a lot of newlines (i thought we had a fallback for this but apparently it's not working)
Me too. For example a document with Ä/ä (a with dots), Ö/ö (o with dots), Å/å (Swedish o) for example seem to cause the issue.
Have a similar issue. I'd like very long outputs from ChatGPT, but don't really know how many tokens I can work with. Tried implementing my own token counting with tiktoken
but still problematic because I don't know how many tokens Llama Index is generating under the hood. Should I be using something besides GPTSimpleVectorIndex?
EDIT: Ah, I'm referring to something different... I'm referring to setting the max_tokens
manually.
Any chance this issue could get the priority to get fixed soon? Or provide a way to skip the token estimation, this is really a show stopper to me :(
Hey folks, as a work around, I think you can explicitly set a lower chunk size for now. In the 0.5.0 API, this would look like
service_context = ServiceContext.from_defaults(chunk_size_limit=3000)
index = GPTListIndex.from_documents(documents, service_context)
@not-poma is this still an issue in the latest versions of llama-index? If not, can you re-open this issue? 👍🏻