graphrag
graphrag copied to clipboard
tokens_per_minute seem not to reflect in engine
The configuration of tokens_per_minutes
in settings.yaml
seems not to be adapted by the indexing engine. I've tried setting it to both 50000
and 50_000
(as per the commented example) but I see the same outcome in index-engine.log
= 0.
I repeatedly hit 429s, no matter what I do.
The content of settings.yaml
:
llm:
api_key: ${GRAPHRAG_API_KEY}
type: azure_openai_chat # or azure_openai_chat
model: gpt-4o
model_supports_json: true # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
api_base: https://redacted.openai.azure.com/
api_version: 2024-02-15-preview
# organization: <organization_id>
deployment_name: gpt4o
tokens_per_minute: 50000 # set a leaky bucket throttle
# requests_per_minute: 20 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
contents of {run_id}\reports\index-engine.log
:
"llm": {
"api_key": "REDACTED, length 32",
"type": "azure_openai_chat",
"model": "gpt-4o",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "https://redacted.openai.azure.com/",
"api_version": "2024-02-15-preview",
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": "gpt4o",
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},