Queue requests and return 429s when the server gets overloaded
Right now we eagerly try to fulfill all shape requests.
Too many requests can crash the server or postgres or trigger timeouts.
We should handle this more gracefully.
First we should queue requests and limit concurrency on fulfilling them.
We should perhaps have one queue for already created shapes vs. ones we're initing as they have different performance characteristics (reading the cached logs is very cheap vs. creating a new shape).
If the queue gets too long, we can start returning requests at the end of the queue with a 429 http code (Too Many Requests). Clients would need to special-case 429 as a retry-able error.
We'll need to figure out heuristics for "too long" of queue — which could be a combination of hard limits + soft limits based on postgres & electric loads i.e. if postgres is slow responding to queries, we could drop concurrency and queue length temporarily.
We're going to be adding rate-limiting at our "proxy" endpoint and returning 429s as well, which sets x-ratelimit-limit, x-ratelimit-remaining, and x-ratelimit-reset headers. Having the client retry by respecting those headers (instead of a simple exponential backoff) would be great.
For reference, the x-ratelimit-reset is a unix timestamp in ms.
Ah interesting yeah 🤔 is there standard rate limiting headers? We could definitely at worst expose a function to configure back off.
We'll need to figure out heuristics for "too long" of queue — which could be a combination of hard limits + soft limits based on postgres & electric loads i.e. if postgres is slow responding to queries, we could drop concurrency and queue length temporarily.
A couple of suggestions: de-dupe shapes by definition, order requests by offset ascending order will do a lot.