pg_wait_sampling icon indicating copy to clipboard operation
pg_wait_sampling copied to clipboard

Keep profile and history data in shared memory

Open maksm90 opened this issue 2 years ago • 3 comments

To simplify interaction between collector process and client backends requesting profile or history data the current patch adds shared data structures to store statistics: the fixed-size shared hash table to keep profile and fixed-size shared array to implement ring buffer for history data. Shared hash table for profile has fixed size specified by pg_wait_sampling.max_profile_entries GUC. The least used entries are diplaced from hash table when its overflow encounters. The eviction algorithm is the same that is used in pg_stat_kcache extension - it's based on usage metric stored within hash table entries. The shared structures for profile and history are solely in-memory and not persisted to external disk. So after server restart all statistics fully reset. This is not bad because for wait monitoring it's enough to keep track differential counters in profile statistics.

Current patch also makes all timing period GUCs reloadable via SIGHUP. Other GUCs in some way have impact on allocation of shared resources so they are done changable via server restart.

The history keeping looks not usable for regular monitoring of wait events so in current patch it's disabled by default by specifying zero value for pg_wait_sampling.history_period GUC.

maksm90 avatar Feb 01 '23 07:02 maksm90

@rjuju could you make review of this PR and give some feedback?

maksm90 avatar Feb 01 '23 08:02 maksm90

Why do we need to maintain our own array of queryIds? Why can't we just read PgBackendStatus.st_query_id as pg_stat_activity does? It turns out st_query_id is zero during execution of a prepared statement. (exec_execute_message in postgres.c calls pgstat_report_activity(STATE_RUNNING) and it resets st_query_id.)

Here is a quick demo. In a psql session execute: select pg_sleep(30) \bind \g. (\bind uses the extended query protocol, like prepared statements do.) In another session query pg_stat_activity for the first session while it's sleeping:

wait_event_type  | Timeout
wait_event       | PgSleep
state            | active
query_id         | 
query            | select pg_sleep(30) 

Oops, query_id is blank.

Arguably it's a bug: https://www.postgresql.org/message-id/CA%2B427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ%40mail.gmail.com

shinderuk avatar May 21 '24 13:05 shinderuk

Why do we need to maintain our own array of queryIds? Why can't we just read PgBackendStatus.st_query_id as pg_stat_activity does?

The answer is in tracking of queryId for just top-level statement in PgBackendStatus.st_query_id. More discussion about current design is in https://github.com/postgrespro/pg_wait_sampling/pull/42#issuecomment-1079635726 and related issues is inside https://github.com/postgrespro/pg_wait_sampling/issues/43

maksm90 avatar May 26 '24 08:05 maksm90