pg_wait_sampling
pg_wait_sampling copied to clipboard
Keep profile and history data in shared memory
To simplify interaction between collector process and client backends requesting profile or history data the current patch adds shared data structures to store statistics: the fixed-size shared hash table to keep profile and fixed-size shared array to implement ring buffer for history data.
Shared hash table for profile has fixed size specified by pg_wait_sampling.max_profile_entries GUC. The least used entries are diplaced from hash table when its overflow encounters. The eviction algorithm is the same that is used in pg_stat_kcache extension - it's based on usage metric stored within hash table entries.
The shared structures for profile and history are solely in-memory and not persisted to external disk. So after server restart all statistics fully reset. This is not bad because for wait monitoring it's enough to keep track differential counters in profile statistics.
Current patch also makes all timing period GUCs reloadable via SIGHUP. Other GUCs in some way have impact on allocation of shared resources so they are done changable via server restart.
The history keeping looks not usable for regular monitoring of wait events so in current patch it's disabled by default by specifying zero value for pg_wait_sampling.history_period GUC.
@rjuju could you make review of this PR and give some feedback?
Why do we need to maintain our own array of queryIds? Why can't we just read PgBackendStatus.st_query_id as pg_stat_activity does? It turns out st_query_id is zero during execution of a prepared statement. (exec_execute_message in postgres.c calls pgstat_report_activity(STATE_RUNNING) and it resets st_query_id.)
Here is a quick demo. In a psql session execute: select pg_sleep(30) \bind \g. (\bind uses the extended query protocol, like prepared statements do.) In another session query pg_stat_activity for the first session while it's sleeping:
wait_event_type | Timeout
wait_event | PgSleep
state | active
query_id |
query | select pg_sleep(30)
Oops, query_id is blank.
Arguably it's a bug: https://www.postgresql.org/message-id/CA%2B427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ%40mail.gmail.com
Why do we need to maintain our own array of queryIds? Why can't we just read PgBackendStatus.st_query_id as pg_stat_activity does?
The answer is in tracking of queryId for just top-level statement in PgBackendStatus.st_query_id. More discussion about current design is in https://github.com/postgrespro/pg_wait_sampling/pull/42#issuecomment-1079635726 and related issues is inside https://github.com/postgrespro/pg_wait_sampling/issues/43