apm-server Simplify non-blocking request handling

In #8979, we introduced a new async flag that can be set by clients to request asynchronous handling of their events. We should improve how we handle asynchronous requests by:

Separate the BatchProcessor chains into two, communicating with a channel:
- Fast chain (or frontend?): May briefly block for authorization and rate-limiting purposes.
  - If the channel is full, fail fast for async requests, otherwise block until send completes (channel has capacity), or ctx.Done().
- Slow chain (or backend?): Performs slower operations which may block or simnifically slow down callers.
- Channel size, we should explore what kind of size makes sense in light of the sized semaphore that exists for the intake handler.
Modify the context.Context indicating that the slow so the batch processor chain can distinguish between blocking and non-blocking requests.
The stream processor no longer needs to differentiate between async/sync requests and can just call processor.ProcessBatch in line and the handling of the batch can be an implementation detail.

Sep 08 '22 07:09 marclop

One of the motivations for doing this is to ensure that authorization and rate-limiting errors are returned to the client. When we fix that, we should make sure to update https://github.com/elastic/apm-server/blob/main/docs/api-events.asciidoc#endpoint

Sep 08 '22 07:09 axw

@axw can you elaborate on this? Afaics authorization errors are also returned in the current async implementation.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

Sep 08 '22 13:09 simitt

@simitt all of these model processors will be run in the background goroutine:

https://github.com/elastic/apm-server/blob/bd0969db37e6f6200b496cf2fa10b3d3239323ff/internal/beater/beater.go#L670-L693

That includes authorization. Authentication is still done synchronously, as that happens in HTTP middleware, prior to the processor/stream code being reached. There's also request-level rate limiting happening in middleware.

The parts that will be happening asynchronously now are per-event authorization checks, and per-event rate limiting. They're still happening, it's just that any authorization error (which, in the case of Lambda, are highly unlikely) will only logged in the server and won't surface to the client. Similarly, rate limiting isn't likely to affect Lambda; it's only relevant when auth is disabled, which is usually only the case for RUM. None of the other processors are likely to fail.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

IMO that's not necessary. If we already had support for service.name constraints on API Keys, then I might have a different opinion, because then per-event authorization checks would be more relevant.

Sep 09 '22 02:09 axw

async parameter for non-blocking request handling was removed in https://github.com/elastic/apm-data/pull/197

This can be closed

May 17 '24 04:05 kruskall