[Issue]: <sleeep> sleep_on_rate_limit_recommendation is not working for groq

Open prasantpoudel opened this issue 1 year ago • 0 comments

Is there an existing issue for this?

[ ] I have searched the existing issues
[ ] I have checked #657 to validate if my issue is covered by community support

Describe the issue

i am using lamma 8b model for graphrag. the limit of llama3-8b-8192 of the groq is per minute 30k token, when the token is exceeded, but the code invoke the api, at that time code has to sleep for some time.

Steps to reproduce

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GROQ_API_KEY} # groq api key
  type: openai_chat # or azure_openai_chat
  model: llama3-8b-8192
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4000
  # request_timeout: 180.0
  api_base: https://api.groq.com/openai/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  tokens_per_minute: 2000 # set a leaky bucket throttle
  requests_per_minute: 1 # set a leaky bucket throttle
  max_retries: 3
  max_retry_wait: 10000.0
  sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made / default is 25 / reduce if using the groq

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

Aug 05 '24 05:08 prasantpoudel