graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

tokens_per_minute seem not to reflect in engine

Open eyast opened this issue 7 months ago • 5 comments

The configuration of tokens_per_minutes in settings.yaml seems not to be adapted by the indexing engine. I've tried setting it to both 50000 and 50_000 (as per the commented example) but I see the same outcome in index-engine.log = 0. I repeatedly hit 429s, no matter what I do.

The content of settings.yaml:

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: azure_openai_chat # or azure_openai_chat
  model: gpt-4o
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: https://redacted.openai.azure.com/
  api_version: 2024-02-15-preview
  # organization: <organization_id>
  deployment_name: gpt4o
  tokens_per_minute: 50000 # set a leaky bucket throttle
  # requests_per_minute: 20 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

contents of {run_id}\reports\index-engine.log:

    "llm": {
        "api_key": "REDACTED, length 32",
        "type": "azure_openai_chat",
        "model": "gpt-4o",
        "max_tokens": 4000,
        "request_timeout": 180.0,
        "api_base": "https://redacted.openai.azure.com/",
        "api_version": "2024-02-15-preview",
        "proxy": null,
        "cognitive_services_endpoint": null,
        "deployment_name": "gpt4o",
        "model_supports_json": true,
        "tokens_per_minute": 0,
        "requests_per_minute": 0,
        "max_retries": 10,
        "max_retry_wait": 10.0,
        "sleep_on_rate_limit_recommendation": true,
        "concurrent_requests": 25
    },
    "parallelization": {
        "stagger": 0.3,
        "num_threads": 50
    },

eyast avatar Jul 05 '24 01:07 eyast