[Bug]: `llm:max_tokens` appears to refer to max_tokens per input rather than output.

Open sdjd93dj opened this issue 1 year ago • 1 comments

Describe the bug

Indexing fails to run the first stage at create_base_extracted_entities because llm is receiving more tokens than it can take, and failing at the API. This is a silent failure (until the indexing proceeds to the next stage). GraphRAG does not error out but it does show the failures in the logs.

Steps to reproduce

Set llm:max_tokens in settings.yaml to a token count higher than your model can take.

Expected Behavior

Indexing does not immediately fail, but in the first step, no entities are detected.

GraphRAG Config Used

llm: api_key: "" # ${GRAPHRAG_API_KEY} type: openai_chat # openai_chat or azure_openai_chat model: "meta.llama3-8b-instruct-v1:0" # gpt-4-turbo-preview model_supports_json: true # recommended if this is available for your model. max_tokens: 4000 ### limit for the model selected is 2048 api_base: "http://URL/api/v1"

Logs and screenshots

File "/home/ec2-user/anaconda3/envs/graphrag/lib/python3.12/site-packages/openai/_base_client.py", line 1620, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'detail': 'An error occurred (ValidationException) when calling the Converse operation: The maximum tokens you requested exceeds the model limit of 2048. Try again with a maximum tokens value that is lower than 2048.'}

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

Jul 17 '24 03:07 sdjd93dj