feat: introduce `RPM` rate limiting for providers/models pairs

Open spoons-and-mirrors opened this issue 4 months ago • 0 comments

Summary

Rate limit handing is currently missing. This PR introduces per provider/model pair rate limiting through the config file, adding the rpm field to limit.

Implementation is simple, it sleeps the request that would cause limiting in order to stay in the "message flow" so you don' t have to wait and re-prompt the model after you've hit limits.

"provider": {
    "google": {
      "models": {
        "gemini-2.5-pro":{
          "limit":{
            "rpm": 10
          }
        }
      }
    }
  }

The status bar has also been updated to show the ETA of the next request when being limited

Notes

I'm unsure if rpm should be nested under a rate object or not @thdxr ?

Aug 19 '25 20:08 spoons-and-mirrors