MetaGPT RAG with GPT-4o: `Calculated available context size -271 was not non-negative` LlamaIndex exception.

Bug description Hi, I have been struggling trying to run RAG using GPT-4o in the v0.8.1 of MetaGPT. When I run the first code example, it following error occurs:

{
	"name": "ValueError",
	"message": "Calculated available context size -271 was not non-negative.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 response = engine.query(\"What does Bob like?\")
      2 response

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:187, in RetrieverQueryEngine._query(self, query_bundle)
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    185 ) as query_event:
    186     nodes = self.retrieve(query_bundle)
--> 187     response = self._response_synthesizer.synthesize(
    188         query=query_bundle,
    189         nodes=nodes,
    190     )
    192     query_event.on_end(payload={EventPayload.RESPONSE: response})
    194 return response

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/base.py:188, in BaseSynthesizer.synthesize(self, query, nodes, additional_source_nodes, **response_kwargs)
    183     query = QueryBundle(query_str=query)
    185 with self._callback_manager.event(
    186     CBEventType.SYNTHESIZE, payload={EventPayload.QUERY_STR: query.query_str}
    187 ) as event:
--> 188     response_str = self.get_response(
    189         query_str=query.query_str,
    190         text_chunks=[
    191             n.node.get_content(metadata_mode=MetadataMode.LLM) for n in nodes
    192         ],
    193         **response_kwargs,
    194     )
    196     additional_source_nodes = additional_source_nodes or []
    197     source_nodes = list(nodes) + list(additional_source_nodes)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:37, in CompactAndRefine.get_response(self, query_str, text_chunks, prev_response, **response_kwargs)
     33 \"\"\"Get compact response.\"\"\"
     34 # use prompt helper to fix compact text_chunks under the prompt limitation
     35 # TODO: This is a temporary fix - reason it's temporary is that
     36 # the refine template does not account for size of previous answer.
---> 37 new_texts = self._make_compact_text_chunks(query_str, text_chunks)
     38 return super().get_response(
     39     query_str=query_str,
     40     text_chunks=new_texts,
     41     prev_response=prev_response,
     42     **response_kwargs,
     43 )

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:52, in CompactAndRefine._make_compact_text_chunks(self, query_str, text_chunks)
     49 refine_template = self._refine_template.partial_format(query_str=query_str)
     51 max_prompt = get_biggest_prompt([text_qa_template, refine_template])
---> 52 return self._prompt_helper.repack(max_prompt, text_chunks)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:276, in PromptHelper.repack(self, prompt, text_chunks, padding, llm)
    263 def repack(
    264     self,
    265     prompt: BasePromptTemplate,
   (...)
    268     llm: Optional[LLM] = None,
    269 ) -> List[str]:
    270     \"\"\"Repack text chunks to fit available context window.
    271 
    272     This will combine text chunks into consolidated chunks
    273     that more fully \"pack\" the prompt template given the context_window.
    274 
    275     \"\"\"
--> 276     text_splitter = self.get_text_splitter_given_prompt(
    277         prompt, padding=padding, llm=llm
    278     )
    279     combined_str = \"\
\
\".join([c.strip() for c in text_chunks if c.strip()])
    280     return text_splitter.split_text(combined_str)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:234, in PromptHelper.get_text_splitter_given_prompt(self, prompt, num_chunks, padding, llm)
    224 def get_text_splitter_given_prompt(
    225     self,
    226     prompt: BasePromptTemplate,
   (...)
    229     llm: Optional[LLM] = None,
    230 ) -> TokenTextSplitter:
    231     \"\"\"Get text splitter configured to maximally pack available context window,
    232     taking into account of given prompt, and desired number of chunks.
    233     \"\"\"
--> 234     chunk_size = self._get_available_chunk_size(
    235         prompt, num_chunks, padding=padding, llm=llm
    236     )
    237     if chunk_size <= 0:
    238         raise ValueError(f\"Chunk size {chunk_size} is not positive.\")

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:218, in PromptHelper._get_available_chunk_size(self, prompt, num_chunks, padding, llm)
    215     prompt_str = get_empty_prompt_txt(prompt)
    216     num_prompt_tokens = self._token_counter.get_string_tokens(prompt_str)
--> 218 available_context_size = self._get_available_context_size(num_prompt_tokens)
    219 result = available_context_size // num_chunks - padding
    220 if self.chunk_size_limit is not None:

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:150, in PromptHelper._get_available_context_size(self, num_prompt_tokens)
    148 context_size_tokens = self.context_window - num_prompt_tokens - self.num_output
    149 if context_size_tokens < 0:
--> 150     raise ValueError(
    151         f\"Calculated available context size {context_size_tokens} was\"
    152         \" not non-negative.\"
    153     )
    154 return context_size_tokens

ValueError: Calculated available context size -271 was not non-negative."
}

This is my configuration file:

llm:
  api_type: "openai"  # or azure / ollama / open_llm etc. Check LLMType for more options
  model: "gpt-4o"  # or gpt-3.5-turbo-1106 / gpt-4-1106-preview
  base_url: "https://api.openai.com/v1"  # or forward url / other llm url
  api_key: "..."

embedding:
  api_type: "openai" # openai / azure / gemini / ollama etc. Check EmbeddingType for more options.
  base_url: "https://api.openai.com/v1"  # or forward url / other llm url
  api_key: "..."
  model: "text-embedding-3-small"
  dimensions: 1536 # output dimension of embedding model

Bug solved method I have checked the code, and found that it happens because the context size of the gpt-4o model is not defined (this also happens with gpt-4-turbo, which is not so recent) in the metagpt/utils/token_counter.py file. Therefore, the default context size (3900) is used, resulting in this error.

The exception is thrown by LlamaIndex, and is not informative enough to understand what is going on. This problem should be handled internally by MetaGPT. Adding a context_size field to the configuration file may be useful, as it would allow users to use models that are not yet supported, as well as limit the length of requests sent to the LLM provider (if there was a reason to do it).

Jun 28 '24 07:06 Trawczynski

Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string. llm: api_type: xxx model: xxx ... .... max_token: 2048

Jun 29 '24 06:06 byang1981

Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string. llm: api_type: xxx model: xxx ... .... max_token: 2048

Uesful Answer!!

Jul 07 '24 17:07 Thoams0211

Since no further responses are needed, we will close it. Please reopen it if necessary.

Oct 09 '24 17:10 better629

This issue can also occur when you create an index with embedding model max_token set differently than when you are reloading the indexes from the persist directory with llm model max_token set differently.

Make sure to set the max_token parameter the same, while creating a persistent index and while loading the persistent index to use as query_agent.

Oct 28 '24 10:10 AryanSakhala

For me the problem was in PromptHelper, fixed it using Settings._prompt_helper = PromptHelper(context_window=6000), default context window is 3900 tokens

Jan 14 '25 11:01 devanshsaini11