QueryLogger should fairly respect log throttling for all logs
https://github.com/apache/pinot/pull/15264 introduced an info log with the full sql for each request. In high QPS environments, this causes at least high log volume but may also lead to heavy degradation if your logger is not set up correctly and blocks the process.
The current state seems to be:
- QueryLogger is where Pinot logs queries before/after they are executed
- All queries are logged when they are received.
- this behavior is on by default
- if explicitly disabled, request logs respect the existing rate limiter
- all queries are logged when they complete
- by default, these are rate limited to 10k logs/second
- partial results and queries that take > 1s bypass this rate limit and are always logged
- there is some further tuning/observability
- once per second we log how many logs were dropped
- the query itself can be truncated before being logged
Other databases, like postgres, specifically do not log any queries by default. This behavior can be enabled separately.
The current state is untenable for high QPS use case:
- if
logBeforeProcessingis enabled, you will log significantly more than desired - if
logBeforeProcessingis disabled, then request logs will consume the entire log rate limiter, and you will not see the response logs
My suggestion is to:
- split the existing rate limiter into 2, one for requests and one for responses
- give each rate limiter 1/2 of the rate limit configured for broker logging
- remove the
logBeforeProcessingconfig, and just respect the rate limiters
I feel this is the most logical tradeoff for all parties:
- it reduces configs by removing 1
- it follows fairly intuitive behavior in steady state in that you more or less get 1:1 request:response logs
- it allows anyone that wants all the logs to set the rate limit infinitely high
- it provides a reasonable bound (10k/second) such that anyone picking up Pinot today gets all the logs, and anyone scaling their Pinot usage doesn't unexpectedly impact their clusters
I don't think any large Pinot user that was rate limiting to X logs per second will contest here. The end state is still X logs per second with 1/2 as many unique queries being logged, but with better observability about what queries were being received.
cc @gortiz , let me know what you think about this proposal
I'm sorry for not getting back to you sooner. I think your concerns are valid. I firmly think the idea of QueryLogger is an antipattern by definition because it goes against the Log4j2 logging principles. Instead we should
- Classify logs correctly: Although recently we improved this by marking stable CID for queries, our classification is very poor. Basically, we only classify by CID and by logger class. Instead we should add other classes like using markers for logs that may be spammy
- Do not log queries as part of the message. Include the SQL in the logger context instead.
Better classification using Slf4J markers and context
We should never have something like:
LOGGER.info("SQL query for request {}: {}", requestId, query);
or
public void log(QueryLogParams params) {
...
// always log the query last - don't add this to the QueryLogEntry enum
queryLogBuilder.append("query=")
.append(StringUtils.substring(params._requestContext.getQuery(), 0, _maxQueryLengthToLog));
_logger.info(queryLogBuilder.toString());
}
Instead we should log as follows:
LoggerConstants.SQL_KEY.registerInMdc(params._requestContext.getQuery());
try {
LOGGER.info(QUERY_RECEIVED_MARKER, "SQL query started");
} finally {
LoggerConstants.SQL_KEY.unregisterFromMdc();
}
and
LoggerConstants.SQL_KEY.registerInMdc(params._requestContext.getQuery());
try {
// don't add the query to the message, obtain the actual SQL from QUERY_RECEIVED_MARKER if enabled
// or include in your pattern using %X{pinot.query.sql}
_logger.info(QUERY_FINISHED_MARKER, queryLogBuilder.toString());
} finally {
LoggerConstants.SQL_KEY.unregisterFromMdc();
}
}
Where QUERY_RECEIVED_MARKER and QUERY_FINISHED_MARKER are SLF4J markers with, for example, the pinot.sql.received and pinot.sql.finished values.
Decide what to log with Log4J
Then each user can decide whether these logs should be printed and whether the query should be printed with them.
For example, in log4j we could decide whether we want to print the initial SQL message with a filter like:
<MarkerFilter marker="pinot.query.received" onMatch="ALLOW" onMismatch="NEUTRAL"/>
Or disable that log with
<MarkerFilter marker="pinot.query.received" onMatch="DENY" onMismatch="NEUTRAL"/>
We can also send the log to a pattern layout that includes %X{pinot.query.sql}, which will print the query, or to one that doesn't include that conversion, in which case the query won't be printed. We could even send it to a pattern layout like:
[cid=%X{pinot.query.cid}] %m [query=%maxLen{%X{pinot.query.query}}{300}]
So we would have a log like:
[cid=123123123] SQL query started [query=select ...]
Where the CID is unique for the SQL query, and the query will be limited to 300 characters.
Then the log printed when queries finish can be redirected to a pattern like
[cid=%X{pinot.query.cid}] %m
So the log will be:
[cid=123123123] table=whatever, timeMs=whatever...
But the query won't be printed. Instead, we can pair that log with the previous one by CID. If instead we want to keep the same previous log, we can use a pattern like:
[cid=%X{pinot.query.cid}] %m, query=%maxLen{%X{pinot.query.query}}{300}
Limit the log spam
Instead of using the very opinionated QueryLogger, we can delegate to the standard BurstFilter. We can either send to that filter all the logs or all the logs related to queries. How can we do the latest? Either by sending them all the logs from QueryLoader (or the equivalent loggers if we end up removing this class)
Different Log4j2 sample configs
Most users don't want to spend their time fine-tuning log4j2 config. Instead, we can provide several log4j2 sample files and modify the bash script, adding an argument to choose between them.
Cons
The main problems I see with this approach are:
- Keeping backward compatibility will be difficult. With that I mean to print the same logs.
- Moving logic to log4j2 files means logging is more difficult to test
Advantages
- Each customer can customize their logs as much as they want. They can even change the configuration without a restart (see https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-attribute-monitorInterval)
- We can apply the same technique to spammy logs beyond the query.
- We can go further and classify queries by a query option, so users could mark queries as important (and therefore always log them) or not important (which may use more constraint burst filter)
adding one more example, https://github.com/apache/pinot/blob/master/pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java#L477, which prints out rule timings for every query