check_postgres
check_postgres copied to clipboard
query_time giving back a longrunner without any statement on a logical replication
Hi,
we expecting wrong results with the query_time check on a postgres10 with logical replication. It will report a long running query, but w/o any statement. This behaviour is happening on a DB with logical replication.
POSTGRES_QUERY_TIME WARNING: DB "postgres" (host:gaga@server) longest query: 84198s (23 hours 23 minutes 18 seconds) PID:153152 database:gaga username:gaga1 address:::1 port:46146 query: | time=0.07s query_time=84198s;600
So we changed the statement on "sub check_query_time" to exclude any query without a statement entry. With that enhancement the check is working correctly, because we will not get empty queries anymore.
sub check_query_time {
## Check the length of running queries
check_txn_idle('qtime',
msg('queries'),
msg('query-time'),
'query_start',
q{query_start IS NOT NULL AND current_query NOT LIKE '<IDLE>%'});
return;
} ## end of check_query_time
changed to:
sub check_query_time {
## Check the length of running queries
check_txn_idle('qtime',
msg('queries'),
msg('query-time'),
'query_start',
q{query_start IS NOT NULL AND current_query NOT LIKE '<IDLE>%' AND length(current_query) > 0});
return;
} ## end of check_query_time
Check result:
POSTGRES_QUERY_TIME OK: DB "postgres" (host:gaga@server) longest query: 0s | time=0.07s query_time=0s;600
Maybee you could insert that fix into the check suite, so othe people with logical-DB's could benefit from that, too.
Regards Björn
This is not limited to logical replication, a streaming client has the same problem. I fixed it using @r4tt4 suggestion.
Confirming fix for streaming replication. Has anyone submitted a pull request?
Hi guys, this should be fixed in d959449b; can you confirm and then we'll close?
Hi, the problem is not fixed by d959449. There are two problems with the fix:
1.) the postgres server which has the issue reports it version back 10 and 10 is not greater then 10 and therefore the correct SQL code is never used. One solution would be to use 9.999 as version. 2.) the version checks are run in the order of the array and discontinued on the first match. So as 10 is greater as 9.1 the 9.1 part matches first. One solution would be to switch position so that the version 10 test comes before the version 9.1 test.