Ensure fast response times
Background
From the beginning, Relay was designed to respond to incoming requests as fast as possible. For example, we optimistically respond with 200 even if we don't know whether a project is rate limited, and before we even deserialize the envelope.
The request handler contains one violation of this design choice: We await on an async response from the ProjectCache to check whether rate limits should be propagated to the client. The project cache is a service that uses an unbounded message queue and could thus delay the HTTP response indefinitely in case it is backlogged.
We discussed some options to resolve this:
- Implement a circuit breaker that skips the awaiting if some conditions are met (e.g. when a number of
CheckEnvelopecalls time out). - Separate project cache tasks into high-priority messages that require a response ("queries") and low-priority messages that are fire-and-forget.
- Split the project cache service into an "observable state" component and an
Addr, similar to what we did for the envelope buffer. The observable part grants read access (not write access) via an internal read-write-lock that encapsulates the project map.
We decided to implement option 3, because measurements showed that option 2 won't resolve the issue (project cache spends most of its time handling CheckEnvelope) and option 1 would only be a stop gap that delays work on the long term solution.
Implementation notes
- Change the internal map of
ProjectCacheinto something that allows concurrent access, e.g.RwLockorDashMap. Possibly two layers of locks, for the index/map and the project itself which is continuously updated (rate limits, config etc.). - Do not
get_or_create_projecton every HTTP request. Send a separatePrefetchmessage to the project cache to make sure the project is updated eventually. - Make sure that all message handlers in the project cache use readonly access, to reduce contention on the read-write-lock.