[Bug]: `llm:max_tokens` appears to refer to max_tokens per input rather than output.
Describe the bug
Indexing fails to run the first stage at create_base_extracted_entities because llm is receiving more tokens than it can take, and failing at the API. This is a silent failure (until the indexing proceeds to the next stage). GraphRAG does not error out but it does show the failures in the logs.
Steps to reproduce
Set llm:max_tokens in settings.yaml to a token count higher than your model can take.
Expected Behavior
Indexing does not immediately fail, but in the first step, no entities are detected.
GraphRAG Config Used
llm: api_key: "" # ${GRAPHRAG_API_KEY} type: openai_chat # openai_chat or azure_openai_chat model: "meta.llama3-8b-instruct-v1:0" # gpt-4-turbo-preview model_supports_json: true # recommended if this is available for your model. max_tokens: 4000 ### limit for the model selected is 2048 api_base: "http://URL/api/v1"
Logs and screenshots
File "/home/ec2-user/anaconda3/envs/graphrag/lib/python3.12/site-packages/openai/_base_client.py", line 1620, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'detail': 'An error occurred (ValidationException) when calling the Converse operation: The maximum tokens you requested exceeds the model limit of 2048. Try again with a maximum tokens value that is lower than 2048.'}
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: