Rate Limit Exceeded
Hello! Thanks for creating a cool port of a cool lib.
I'm trying to get it running and am hitting rate limits, see below. This seems to happen no matter what prompt I run.
➜ smol-dev-js-test smol-dev-js prompt
--------------------
🐣 [ai]: hi its me, the ai dev ! you said you wanted
here to help you with your project, which is a ....
--------------------
An ecommerce admin dashboard. It will contain CRUD screens and API endpoints for a Widget model containing a bunch of fields that might describe a widget. The landing page will have some stats and graphs related to widgets. The application will be built in Next.js application in typescript using Next.js app router. It will also use Prettier, ESLint, TailwindCSS and ShadCN for UI. It will use Postgres as a database, and Prisma ORM to communicate with it. Build the charts using the Chart.js library.
--------------------
🐣 [ai]: What would you like me to do? (PS: this is not a chat system, there is no chat memory prior to this point)
✔ [you]: … Suggest something
🐣 [ai]: Unexpected end of stream, with unprocessed data {
"error": {
"message": "Rate limit reached for 10KTPM-200RPM in organization org-... on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.",
"type": "tokens",
"param": null,
"code": "rate_limit_exceeded"
}
}
Unexpected event processing error Unexpected end of stream, with unprocessed data, see warning logs for more details
Unexpected event processing error, see warning logs for more details
I am also facing this issue. Any resolutions?
I found this but it did not help very much.
@PicoCreator thoughts?
You can configure "provider rate limit" in the generated config file : https://github.com/PicoCreator/smol-dev-js/blob/ba496cb20440654a32287015645e3852615f5716/src/core/config.js#L38C7-L38C24
And it should help mitigate the issue - or alternatively switch to gpt3.5 which has higher rate limit.
For most part, as OpenAI clamps down on gpt4 rate limit more, this might be an issue that might not be resolvable if using gpt4
Hey @mcavaliere @nshmadhani @PicoCreator, I'm the maintainer of LiteLLM. Our openai proxy has fallbacks which could help here - if gpt-4 rate limits are reached, fallback to gpt-3.5-turbo. You can also use it to just load balance across multiple azure gpt-4 instances:
Step 1: Put your instances in a config.yaml
model_list:
model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>
litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]
Step 2: Install LiteLLM
$ pip install litellm
Step 3: Start litellm proxy w/ config.yaml
$ litellm --config /path/to/config.yaml
Docs: https://docs.litellm.ai/docs/simple_proxy
Would this help out in your scenario?