cosmo
cosmo copied to clipboard
feat(router): enable fail open on rate limit redis availability failure
Motivation and Context
Per https://github.com/wundergraph/cosmo/issues/1555
On testing a Cosmo Rate Limit implementation I discovered every request would get a 500 response if Redis is scaled to zero, or if authentication is malformed.
This small change introduces a fail open configuration bool on rate_limit that allows requests to succeed, albeit delayed by the timeout on failing to hit Redis per request.
With fail open enabled the server is also able to start up without Redis availability, when Redis comes back up it will begin rate limiting again.
The per request penalty can be ameliorated by setting tighter timeouts on your Redis connection, for example:
redis://:[email protected]:6379?read_timeout=20ms&write_timeout=20ms&max_retries=-1&dial_timeout=100ms
Potential improvements
This implementation is naive in that you will pay a per request penalty whenever Redis isn't available, this wouldn't be ideal in a situation where Redis isn't available for > 1 request at a time.
It would be better to implement an internal cache method where Redis availability could be polled to at some interval async to the request pipeline. Then the request threads could just read the cache to determine if they should proceed with rate limiting or fail open fast. The tradeoff being that you can't guarantee you tried to rate limit on each request.
Checklist
- [ ] I have discussed my proposed changes in an issue and have received approval to proceed.
- [x] I have followed the coding standards of the project.
- [ ] Tests or benchmarks have been added or updated.
- [ ] Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
- [x] I have read the Contributors Guide.
Testing this additional feature is challenging and would likely need to take place in integration tests, glad to make a stab at that if y'all are interested in moving forward with this!