RAG with GPT-4o: `Calculated available context size -271 was not non-negative` LlamaIndex exception.
Bug description Hi, I have been struggling trying to run RAG using GPT-4o in the v0.8.1 of MetaGPT. When I run the first code example, it following error occurs:
{
"name": "ValueError",
"message": "Calculated available context size -271 was not non-negative.",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 response = engine.query(\"What does Bob like?\")
2 response
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
38 if isinstance(str_or_query_bundle, str):
39 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:187, in RetrieverQueryEngine._query(self, query_bundle)
183 with self.callback_manager.event(
184 CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
185 ) as query_event:
186 nodes = self.retrieve(query_bundle)
--> 187 response = self._response_synthesizer.synthesize(
188 query=query_bundle,
189 nodes=nodes,
190 )
192 query_event.on_end(payload={EventPayload.RESPONSE: response})
194 return response
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/base.py:188, in BaseSynthesizer.synthesize(self, query, nodes, additional_source_nodes, **response_kwargs)
183 query = QueryBundle(query_str=query)
185 with self._callback_manager.event(
186 CBEventType.SYNTHESIZE, payload={EventPayload.QUERY_STR: query.query_str}
187 ) as event:
--> 188 response_str = self.get_response(
189 query_str=query.query_str,
190 text_chunks=[
191 n.node.get_content(metadata_mode=MetadataMode.LLM) for n in nodes
192 ],
193 **response_kwargs,
194 )
196 additional_source_nodes = additional_source_nodes or []
197 source_nodes = list(nodes) + list(additional_source_nodes)
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:37, in CompactAndRefine.get_response(self, query_str, text_chunks, prev_response, **response_kwargs)
33 \"\"\"Get compact response.\"\"\"
34 # use prompt helper to fix compact text_chunks under the prompt limitation
35 # TODO: This is a temporary fix - reason it's temporary is that
36 # the refine template does not account for size of previous answer.
---> 37 new_texts = self._make_compact_text_chunks(query_str, text_chunks)
38 return super().get_response(
39 query_str=query_str,
40 text_chunks=new_texts,
41 prev_response=prev_response,
42 **response_kwargs,
43 )
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:52, in CompactAndRefine._make_compact_text_chunks(self, query_str, text_chunks)
49 refine_template = self._refine_template.partial_format(query_str=query_str)
51 max_prompt = get_biggest_prompt([text_qa_template, refine_template])
---> 52 return self._prompt_helper.repack(max_prompt, text_chunks)
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:276, in PromptHelper.repack(self, prompt, text_chunks, padding, llm)
263 def repack(
264 self,
265 prompt: BasePromptTemplate,
(...)
268 llm: Optional[LLM] = None,
269 ) -> List[str]:
270 \"\"\"Repack text chunks to fit available context window.
271
272 This will combine text chunks into consolidated chunks
273 that more fully \"pack\" the prompt template given the context_window.
274
275 \"\"\"
--> 276 text_splitter = self.get_text_splitter_given_prompt(
277 prompt, padding=padding, llm=llm
278 )
279 combined_str = \"\
\
\".join([c.strip() for c in text_chunks if c.strip()])
280 return text_splitter.split_text(combined_str)
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:234, in PromptHelper.get_text_splitter_given_prompt(self, prompt, num_chunks, padding, llm)
224 def get_text_splitter_given_prompt(
225 self,
226 prompt: BasePromptTemplate,
(...)
229 llm: Optional[LLM] = None,
230 ) -> TokenTextSplitter:
231 \"\"\"Get text splitter configured to maximally pack available context window,
232 taking into account of given prompt, and desired number of chunks.
233 \"\"\"
--> 234 chunk_size = self._get_available_chunk_size(
235 prompt, num_chunks, padding=padding, llm=llm
236 )
237 if chunk_size <= 0:
238 raise ValueError(f\"Chunk size {chunk_size} is not positive.\")
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:218, in PromptHelper._get_available_chunk_size(self, prompt, num_chunks, padding, llm)
215 prompt_str = get_empty_prompt_txt(prompt)
216 num_prompt_tokens = self._token_counter.get_string_tokens(prompt_str)
--> 218 available_context_size = self._get_available_context_size(num_prompt_tokens)
219 result = available_context_size // num_chunks - padding
220 if self.chunk_size_limit is not None:
File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:150, in PromptHelper._get_available_context_size(self, num_prompt_tokens)
148 context_size_tokens = self.context_window - num_prompt_tokens - self.num_output
149 if context_size_tokens < 0:
--> 150 raise ValueError(
151 f\"Calculated available context size {context_size_tokens} was\"
152 \" not non-negative.\"
153 )
154 return context_size_tokens
ValueError: Calculated available context size -271 was not non-negative."
}
This is my configuration file:
llm:
api_type: "openai" # or azure / ollama / open_llm etc. Check LLMType for more options
model: "gpt-4o" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview
base_url: "https://api.openai.com/v1" # or forward url / other llm url
api_key: "..."
embedding:
api_type: "openai" # openai / azure / gemini / ollama etc. Check EmbeddingType for more options.
base_url: "https://api.openai.com/v1" # or forward url / other llm url
api_key: "..."
model: "text-embedding-3-small"
dimensions: 1536 # output dimension of embedding model
Bug solved method
I have checked the code, and found that it happens because the context size of the gpt-4o model is not defined (this also happens with gpt-4-turbo, which is not so recent) in the metagpt/utils/token_counter.py file. Therefore, the default context size (3900) is used, resulting in this error.
The exception is thrown by LlamaIndex, and is not informative enough to understand what is going on.
This problem should be handled internally by MetaGPT. Adding a context_size field to the configuration file may be useful, as it would allow users to use models that are not yet supported, as well as limit the length of requests sent to the LLM provider (if there was a reason to do it).
Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string. llm: api_type: xxx model: xxx ... .... max_token: 2048
Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string. llm: api_type: xxx model: xxx ... .... max_token: 2048
Uesful Answer!!
Since no further responses are needed, we will close it. Please reopen it if necessary.
This issue can also occur when you create an index with embedding model max_token set differently than when you are reloading the indexes from the persist directory with llm model max_token set differently.
Make sure to set the max_token parameter the same, while creating a persistent index and while loading the persistent index to use as query_agent.
For me the problem was in PromptHelper, fixed it using Settings._prompt_helper = PromptHelper(context_window=6000), default context window is 3900 tokens