smol-dev-js Rate Limit Exceeded

Hello! Thanks for creating a cool port of a cool lib.

I'm trying to get it running and am hitting rate limits, see below. This seems to happen no matter what prompt I run.

➜  smol-dev-js-test smol-dev-js prompt
--------------------
🐣 [ai]: hi its me, the ai dev ! you said you wanted
         here to help you with your project, which is a ....
--------------------
An ecommerce admin dashboard. It will contain CRUD screens and API endpoints for a Widget model containing a bunch of fields that might describe a widget. The landing page will have some stats and graphs related to widgets. The application will be built in Next.js application in typescript using Next.js app router. It will also use Prettier, ESLint, TailwindCSS and ShadCN for UI. It will use Postgres as a database, and Prisma ORM to communicate with it. Build the charts using the Chart.js library.
--------------------
🐣 [ai]: What would you like me to do? (PS: this is not a chat system, there is no chat memory prior to this point)
✔ [you]:  … Suggest something
🐣 [ai]: Unexpected end of stream, with unprocessed data {
    "error": {
        "message": "Rate limit reached for 10KTPM-200RPM in organization org-... on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.",
        "type": "tokens",
        "param": null,
        "code": "rate_limit_exceeded"
    }
}
Unexpected event processing error Unexpected end of stream, with unprocessed data, see warning logs for more details
Unexpected event processing error, see warning logs for more details

Oct 02 '23 13:10 mcavaliere

I am also facing this issue. Any resolutions?

I found this but it did not help very much.

@PicoCreator thoughts?

Oct 09 '23 21:10 nshmadhani

You can configure "provider rate limit" in the generated config file : https://github.com/PicoCreator/smol-dev-js/blob/ba496cb20440654a32287015645e3852615f5716/src/core/config.js#L38C7-L38C24

And it should help mitigate the issue - or alternatively switch to gpt3.5 which has higher rate limit.

For most part, as OpenAI clamps down on gpt4 rate limit more, this might be an issue that might not be resolvable if using gpt4

Oct 17 '23 03:10 PicoCreator

Hey @mcavaliere @nshmadhani @PicoCreator, I'm the maintainer of LiteLLM. Our openai proxy has fallbacks which could help here - if gpt-4 rate limits are reached, fallback to gpt-3.5-turbo. You can also use it to just load balance across multiple azure gpt-4 instances:

Step 1: Put your instances in a config.yaml

model_list:
model_list:
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: gpt-3.5-turbo
        api_key: <my-openai-key>
  - model_name: gpt-3.5-turbo-16k
    litellm_params:
        model: gpt-3.5-turbo-16k
        api_key: <my-openai-key>

litellm_settings:
  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 10 # raise Timeout error if call takes longer than 10s
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]

Step 2: Install LiteLLM

$ pip install litellm

Step 3: Start litellm proxy w/ config.yaml

$ litellm --config /path/to/config.yaml

Docs: https://docs.litellm.ai/docs/simple_proxy

Would this help out in your scenario?

Nov 27 '23 17:11 krrishdholakia