Simplify non-blocking request handling
In #8979, we introduced a new async flag that can be set by clients to request asynchronous handling of their events. We should improve how we handle asynchronous requests by:
- Separate the BatchProcessor chains into two, communicating with a channel:
- Fast chain (or frontend?): May briefly block for authorization and rate-limiting purposes.
- If the channel is full, fail fast for async requests, otherwise block until send completes (channel has capacity), or
ctx.Done().
- If the channel is full, fail fast for async requests, otherwise block until send completes (channel has capacity), or
- Slow chain (or backend?): Performs slower operations which may block or simnifically slow down callers.
- Channel size, we should explore what kind of size makes sense in light of the sized semaphore that exists for the intake handler.
- Fast chain (or frontend?): May briefly block for authorization and rate-limiting purposes.
- Modify the
context.Contextindicating that the slow so the batch processor chain can distinguish between blocking and non-blocking requests. - The stream processor no longer needs to differentiate between async/sync requests and can just call
processor.ProcessBatchin line and the handling of the batch can be an implementation detail.
One of the motivations for doing this is to ensure that authorization and rate-limiting errors are returned to the client. When we fix that, we should make sure to update https://github.com/elastic/apm-server/blob/main/docs/api-events.asciidoc#endpoint
@axw can you elaborate on this? Afaics authorization errors are also returned in the current async implementation.
What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?
@simitt all of these model processors will be run in the background goroutine:
https://github.com/elastic/apm-server/blob/bd0969db37e6f6200b496cf2fa10b3d3239323ff/internal/beater/beater.go#L670-L693
That includes authorization. Authentication is still done synchronously, as that happens in HTTP middleware, prior to the processor/stream code being reached. There's also request-level rate limiting happening in middleware.
The parts that will be happening asynchronously now are per-event authorization checks, and per-event rate limiting. They're still happening, it's just that any authorization error (which, in the case of Lambda, are highly unlikely) will only logged in the server and won't surface to the client. Similarly, rate limiting isn't likely to affect Lambda; it's only relevant when auth is disabled, which is usually only the case for RUM. None of the other processors are likely to fail.
What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?
IMO that's not necessary. If we already had support for service.name constraints on API Keys, then I might have a different opinion, because then per-event authorization checks would be more relevant.
async parameter for non-blocking request handling was removed in https://github.com/elastic/apm-data/pull/197
This can be closed