QueryLogger should fairly respect log throttling for all logs
https://github.com/apache/pinot/pull/15264 introduced an info log with the full sql for each request. In high QPS environments, this causes at least high log volume but may also lead to heavy degradation if your logger is not set up correctly and blocks the process.
The current state seems to be:
- QueryLogger is where Pinot logs queries before/after they are executed
- All queries are logged when they are received.
- this behavior is on by default
- if explicitly disabled, request logs respect the existing rate limiter
- all queries are logged when they complete
- by default, these are rate limited to 10k logs/second
- partial results and queries that take > 1s bypass this rate limit and are always logged
- there is some further tuning/observability
- once per second we log how many logs were dropped
- the query itself can be truncated before being logged
Other databases, like postgres, specifically do not log any queries by default. This behavior can be enabled separately.
The current state is untenable for high QPS use case:
- if
logBeforeProcessingis enabled, you will log significantly more than desired - if
logBeforeProcessingis disabled, then request logs will consume the entire log rate limiter, and you will not see the response logs
My suggestion is to:
- split the existing rate limiter into 2, one for requests and one for responses
- give each rate limiter 1/2 of the rate limit configured for broker logging
- remove the
logBeforeProcessingconfig, and just respect the rate limiters
I feel this is the most logical tradeoff for all parties:
- it reduces configs by removing 1
- it follows fairly intuitive behavior in steady state in that you more or less get 1:1 request:response logs
- it allows anyone that wants all the logs to set the rate limit infinitely high
- it provides a reasonable bound (10k/second) such that anyone picking up Pinot today gets all the logs, and anyone scaling their Pinot usage doesn't unexpectedly impact their clusters
I don't think any large Pinot user that was rate limiting to X logs per second will contest here. The end state is still X logs per second with 1/2 as many unique queries being logged, but with better observability about what queries were being received.