Strange behavior while running `pgbench`: increasing connections and no tps if the run is interrupted and resumed
This has been observer on head 400e019aff3618db442395972c0634e3128ce58f while testing #456 . The operating system is Rocky Linux 9, PostgreSQL 16.6 on localhost. The application has been compiled with clang.
In short: running pgbench several times thru pgagroal works as expected unless it is keyboard interrupted, and from then ongoing the connections keep rising without producing "results".
The pgagroal_databases.conf contains:
% cat /etc/pgagroal/pgagroal_databases.conf
# DATABASE USER MAX INITIAL MIN
pgbench pgbench 20 5 5
so no more than 20 connections, even if in the end they keep increasing over the limit.
First run (OK)
This completes just as expected:
% psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && time /usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 -U pgbench pgbench && psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && pgagroal-cli status details
pid | usename | state | query
-------+----------+--------+------------------------------------------------------------------------------
---------------
24413 | pgbench | idle | DISCARD ALL;
24414 | pgbench | idle | DISCARD ALL;
24415 | pgbench | idle | DISCARD ALL;
24416 | pgbench | idle | DISCARD ALL;
24417 | pgbench | idle | DISCARD ALL;
24419 | postgres | active | select pid, usename, state, query from pg_stat_activity where datname = curre
nt_database();
(6 rows)
pgbench (16.6)
progress: 10.0 s, 1702.3 tps, lat 11.525 ms stddev 6.345, 0 failed
progress: 20.0 s, 1729.5 tps, lat 11.564 ms stddev 2.359, 0 failed
progress: 30.0 s, 1716.5 tps, lat 11.648 ms stddev 2.235, 0 failed
progress: 40.0 s, 1714.1 tps, lat 11.665 ms stddev 2.408, 0 failed
progress: 50.0 s, 1769.7 tps, lat 11.299 ms stddev 2.290, 0 failed
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 20
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 103519
number of failed transactions: 0 (0.000%)
latency average = 11.555 ms
latency stddev = 3.336 ms
initial connection time = 183.264 ms
tps = 1730.345541 (without initial connection time)
/usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 2.64s user 3.30s system 9% cpu 1:01.35 total
pid | usename | state | query
-------+----------+--------+------------------------------------------------------------------------------
---------------
24413 | pgbench | idle | END;
24414 | pgbench | idle | DISCARD ALL;
24415 | pgbench | idle | DISCARD ALL;
24416 | pgbench | idle | END;
24417 | pgbench | idle | END;
24466 | pgbench | idle | END;
24469 | pgbench | idle | END;
24470 | pgbench | idle | DISCARD ALL;
24465 | pgbench | idle | END;
24472 | pgbench | idle | END;
24467 | pgbench | idle | END;
24471 | pgbench | idle | END;
24473 | pgbench | idle | END;
24468 | pgbench | idle | END;
24464 | pgbench | idle | END;
24475 | pgbench | idle | END;
24476 | pgbench | idle | END;
24474 | pgbench | idle | END;
24477 | pgbench | idle | DISCARD ALL;
24478 | pgbench | idle | DISCARD ALL;
24485 | postgres | active | select pid, usename, state, query from pg_stat_activity where datname = curre
nt_database();
(21 rows)
Header:
ClientVersion: 2.0.0
Command: status-details
Compression: none
Encryption: none
Output: text
Timestamp: 20250227113449
Outcome:
Status: true
Time: 00:00:00
Request: {}
Response:
ActiveConnections: 0
Connections:
- AppName: ''
Database: pgbench
FD: 15
PID: -1
StartTime: 1740652420
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 16
PID: -1
StartTime: 1740652420
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 17
PID: -1
StartTime: 1740652420
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 18
PID: -1
StartTime: 1740652420
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 19
PID: -1
StartTime: 1740652420
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 33
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 24
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 22
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 32
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 23
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 34
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 31
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 25
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 28
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 20
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 29
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 27
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 21
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 30
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
- AppName: ''
Database: pgbench
FD: 26
PID: -1
StartTime: 1740652428
Timestamp: 1740652488
Username: pgbench
Databases:
- Database: *
Enabled: true
Limits:
- ActiveConnections: 0
Database: pgbench
InitialConnections: 5
MaxConnections: 20
MinConnections: 5
Username: pgbench
MaxConnections: 20
NumberOfServers: 1
ServerVersion: 2.0.0
Servers:
- Host: 127.0.0.1
Port: 5432
Server: rachel
State: Primary
Status: Running
TotalConnections: 20
Second run (manually inrettuped)
This is keyboard interrupted
% psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && time /usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 -U pgbench pgbench && psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && pgagroal-cli status details
pid | usename | state | query
-------+----------+--------+------------------------------------------------------------------------------
---------------
24413 | pgbench | idle | DISCARD ALL;
24414 | pgbench | idle | DISCARD ALL;
24415 | pgbench | idle | DISCARD ALL;
24416 | pgbench | idle | DISCARD ALL;
24417 | pgbench | idle | DISCARD ALL;
24466 | pgbench | idle | DISCARD ALL;
24469 | pgbench | idle | DISCARD ALL;
24470 | pgbench | idle | DISCARD ALL;
24465 | pgbench | idle | DISCARD ALL;
24472 | pgbench | idle | DISCARD ALL;
24467 | pgbench | idle | DISCARD ALL;
24471 | pgbench | idle | DISCARD ALL;
24473 | pgbench | idle | DISCARD ALL;
24468 | pgbench | idle | DISCARD ALL;
24464 | pgbench | idle | DISCARD ALL;
24475 | pgbench | idle | DISCARD ALL;
24476 | pgbench | idle | DISCARD ALL;
24474 | pgbench | idle | DISCARD ALL;
24477 | pgbench | idle | DISCARD ALL;
24478 | pgbench | idle | DISCARD ALL;
24491 | postgres | active | select pid, usename, state, query from pg_stat_activity where datname = curre
nt_database();
(21 rows)
pgbench (16.6)
progress: 10.0 s, 1674.0 tps, lat 11.713 ms stddev 3.489, 0 failed
progress: 20.0 s, 1762.2 tps, lat 11.346 ms stddev 2.253, 0 failed
^C
/usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 0.96s user 1.09s system 9% cpu 20.793 total
Third run (problematic)
Here the problem is that :tps reported are zero, hence probably pgbench does not honor the -T flag and runs forever, without reporting any particular error. The pgagroal logs do not showing nothing at all during this execution. It seems that the connections are recycled but are in a state "unusable".
% psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && time /usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 -U pgbench pgbench && psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && pgagroal-cli status details
pid | usename | state | query
-------+----------+---------------------+-----------------------------------------------------------------
-----------------------------------------
24413 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + 683 WHERE bid
= 1;
24414 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1564 WHERE tid
= 7;
24415 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -934 WHERE bid
= 1;
24416 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -3933 WHERE bi
d = 1;
24417 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -631 WHERE tid
= 1;
24466 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 79637;
24469 | pgbench | idle in transaction | UPDATE pgbench_accounts SET abalance = abalance + 3366 WHERE aid
= 89946;
24470 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 2003 WHERE tid
= 9;
24465 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -1855 WHERE bi
d = 1;
24472 | pgbench | idle | END;
24467 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -3006 WHERE bi
d = 1;
24471 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 3337 WHERE tid
= 3;
24473 | pgbench | idle in transaction | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(3, 1, 71296, 2341, CURRENT_TIMESTAMP);
24468 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 21339;
24464 | pgbench | idle in transaction | UPDATE pgbench_tellers SET tbalance = tbalance + 2803 WHERE tid
= 4;
24475 | pgbench | idle | END;
24476 | pgbench | idle in transaction | UPDATE pgbench_accounts SET abalance = abalance + 3615 WHERE aid
= 69843;
24474 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 37254;
24477 | pgbench | idle | END;
24478 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -1003 WHERE bi
d = 1;
24550 | postgres | active | select pid, usename, state, query from pg_stat_activity where da
tname = current_database();
(21 rows)
pgbench (16.6)
progress: 10.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 30.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 50.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 60.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 70.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 80.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 90.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 100.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 110.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 120.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
^C
/usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 0.01s user 0.01s system 0% cpu 2:05.89 total
Fourth run (problematic)
Even after a pgagroal-cli flush the situation does not improve, i.e., the tps is still zero. Moreover, the number of connections reported on the PostgreSQL side is increasing over the max_connections configured on the datbaase limit.
~% pgagroal-cli flush
Header:
ClientVersion: 2.0.0
Command: flush
Compression: none
Encryption: none
Output: text
Timestamp: 20250227114004
Outcome:
Status: true
Time: 00:00:00
Request:
Database: *
Mode: 1
% psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && time /usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 -U pgbench pgbench && psql -U postgres -c "select pid, usename, state, query from pg_stat_activity where datname = current_database();" pgbench && pgagroal-cli status details
pid | usename | state | query
-------+----------+---------------------+-----------------------------------------------------------------
-----------------------------------------
24413 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + 683 WHERE bid
= 1;
24414 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1564 WHERE tid
= 7;
24415 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -934 WHERE bid
= 1;
24416 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -3933 WHERE bi
d = 1;
24417 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -631 WHERE tid
= 1;
24466 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 79637;
24469 | pgbench | idle in transaction | UPDATE pgbench_accounts SET abalance = abalance + 3366 WHERE aid
= 89946;
24470 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 2003 WHERE tid
= 9;
24465 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -1855 WHERE bi
d = 1;
24472 | pgbench | idle | END;
24467 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -3006 WHERE bi
d = 1;
24471 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 3337 WHERE tid
= 3;
24473 | pgbench | idle in transaction | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(3, 1, 71296, 2341, CURRENT_TIMESTAMP);
24468 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 21339;
24464 | pgbench | idle in transaction | UPDATE pgbench_tellers SET tbalance = tbalance + 2803 WHERE tid
= 4;
24475 | pgbench | idle | END;
24476 | pgbench | idle in transaction | UPDATE pgbench_accounts SET abalance = abalance + 3615 WHERE aid
= 69843;
24474 | pgbench | idle in transaction | SELECT abalance FROM pgbench_accounts WHERE aid = 37254;
24477 | pgbench | idle | END;
24478 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -1003 WHERE bi
d = 1;
24554 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1519 WHERE tid
= 1;
24594 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 1387 WHERE tid
= 8;
24596 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1972 WHERE tid
= 1;
24597 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -2590 WHERE bi
d = 1;
24598 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 838 WHERE tid =
6;
24599 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 1866 WHERE tid
= 8;
24595 | pgbench | active | UPDATE pgbench_branches SET bbalance = bbalance + -3107 WHERE bi
d = 1;
24600 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -4513 WHERE tid
= 8;
24602 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -96 WHERE tid =
2;
24601 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 1911 WHERE tid
= 3;
24604 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -3768 WHERE tid
= 6;
24603 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 217 WHERE tid =
1;
24607 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 506 WHERE tid =
1;
24605 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 3905 WHERE tid
= 4;
24608 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1868 WHERE tid
= 9;
24606 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 4641 WHERE tid
= 5;
24609 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 4295 WHERE tid
= 6;
24610 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + -1147 WHERE tid
= 7;
24611 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 3278 WHERE tid
= 7;
24612 | pgbench | active | UPDATE pgbench_tellers SET tbalance = tbalance + 2996 WHERE tid
= 9;
24632 | postgres | active | select pid, usename, state, query from pg_stat_activity where da
tname = current_database();
24628 | | active | autovacuum: VACUUM public.pgbench_branches
(42 rows)
pgbench (16.6)
progress: 10.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 30.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 50.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 60.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 70.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 80.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 90.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
^C
/usr/pgsql-16/bin/pgbench -j 20 -c 20 -l -n -P 10 -T 60 -h localhost -p 54333 0.01s user 0.01s system 0% cpu 1:33.36 total
This happens because pgagroal expects the message forever after an interruption of a transaction, no? If so, I think this should be solved with timers in each worker.
@decarv What do you mean ? We can't do a general timer - it could be a long running transaction. We need to detect it somehow and then verify the socket connection
I meant a timer for when a client does not respond. But I get what you're saying. I noticed that this fails for some connections that enter the pgagroal_connection_get_pid. How is this function supposed to work?
@decarv We can def not have a "pgagroal -> client" timer in the performance pipeline... Is it sort of disconnect_client you are thinking of ? See https://github.com/agroal/pgagroal/blob/master/doc/CONFIGURATION.md "Danger zone"
pgagroal_connection_get_pid is for transferring client socket descriptors in when running in transaction mode
@decarv We can def not have a "pgagroal -> client" timer in the performance pipeline... Is it sort of
disconnect_clientyou are thinking of ? See https://github.com/agroal/pgagroal/blob/master/doc/CONFIGURATION.md "Danger zone"
Yes, exactly.
pgagroal_connection_get_pidis for transferring client socket descriptors in when running in transaction mode
Then the perf pipeline code is reaching the wrong path. Reproduced with upstream:
# terminal 1 : run pgagroal
~/pgagroal-main/build/src (master) $ ./pgagroal
2025-03-10 12:18:32 INFO main.c:1141 pgagroal: 2.0.0 started on localhost:2345
2025-03-10 12:18:41 WARN pipeline_perf.c:180 [C] Client error (slot 6 database postgres user postgres): Connection reset by peer (socket 11 status 2)
2025-03-10 12:18:41 INFO message.c:1196 Message is NULL
2025-03-10 12:18:41 WARN pipeline_perf.c:180 [C] Client error (slot 1 database postgres user postgres): Connection reset by peer (socket 11 status 2)
2025-03-10 12:18:41 INFO message.c:1196 Message is NULL
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN pipeline_perf.c:180 [C] Client error (slot 5 database postgres user postgres): Connection reset by peer (socket 11 status 2)
2025-03-10 12:18:41 INFO message.c:1196 Message is NULL
2025-03-10 12:18:41 WARN connection.c:98 pgagroal_management_transfer_connection: connect: -1
2025-03-10 12:18:41 WARN pipeline_perf.c:180 [C] Client error (slot 3 database postgres user postgres): Connection reset by peer (socket 11 status 2)
2025-03-10 12:18:41 INFO message.c:1196 Message is NULL
# terminal 2 : run and cancel pgbench
~/pgagroal-main/build (master) $ pgbench -h localhost -c 8 -T 10 -p 2345 -U postgres postgres
pgbench (17.2, server 16.6)
starting vacuum...end.
^C
I could fix it, but maybe you can point me to the right direction here. What should be happening?
Just some initial thoughts... it is a separate issue...
We can take the log_message out in these paths, and we need new functionality to deal with a -1 connection transfer.
For now, I would say that you should deal with it in the best way you can - I'll think some more
I do confirm that using another pipeline other than perf works. Also, there are two warning log lines that are the same, I would improve them.
Hi,
Reporting my findings.
First, pgagroal_write_discard_all the reply->kind that I'm reading has value 'e', not 'E'. This passes 0 to the caller and the kill_connection is never reached in pgagroal_return_connection. I patched this with || msg->kind == 'e', but I have not investigated why this can happen.
Second, when failure happens, the last message is sent by pgagroal to the server, meaning the server is never responding, reason why UPDATE on the server is always shown on pg_stat_activity.
Ideas?
'e' ? There is no such message - https://www.postgresql.org/docs/devel/protocol-message-formats.html
'e'? There is no such message - https://www.postgresql.org/docs/devel/protocol-message-formats.html
I will investigate this better.
Second, when failure happens, the last message is sent by pgagroal to the server, meaning the server is never responding, reason why UPDATE on the server is always shown on pg_stat_activity.
A potential solution for this is to call pgagroal_write_terminate inside pgagroal_kill_connection. Is this legal according to the protocol?
You can use pgagroal_log_message() to investigate.
Well, it depends on the socket, otherwise you can write X if the previous message is done...
You can cancel pgbench run before the client sends a Q to the server or after.
If you cancel it before, the recovery is fine. If it is after, connections stay idle in transaction in the database with no new messages from pgagroal. In the latter case, we can send X so the database tears the connection.
I believe this can be closed. @fluca1978, can you confirm?
I believe this can be closed. @fluca1978, can you confirm?
I still have the issue with both uring and epoll on head 227f0d12f001c65a3784b1e021ea92ad1ce01915. It is not clear to me if you are proposing to close as a problem specific with pgbench (so no 'X message?).
No, cancelling the pgbench run should leave no connections hanging as before. Here works fine…
On Mon, 19 May 2025 at 12:39 Luca Ferrari @.***> wrote:
fluca1978 left a comment (agroal/pgagroal#503) https://github.com/agroal/pgagroal/issues/503#issuecomment-2890528797
I believe this can be closed. @fluca1978 https://github.com/fluca1978, can you confirm?
I still have the issue with both uring and epoll on head 227f0d1 https://github.com/agroal/pgagroal/commit/227f0d12f001c65a3784b1e021ea92ad1ce01915. It is not clear to me if you are proposing to close as a problem specific with pgbench (so no 'X message?).
— Reply to this email directly, view it on GitHub https://github.com/agroal/pgagroal/issues/503#issuecomment-2890528797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHAW2MUZDPMJYWSXDHTO7L27GYFZAVCNFSM6AAAAABX7PRLG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOJQGUZDQNZZG4 . You are receiving this because you were mentioned.Message ID: @.***>
@decarv I wrote a very bovine test script and it fails both with epoll and uring here.
I see messages like WARN connection.c:98 pgagroal_management_transfer_connection: get_pid connect = -1, pid = 2274, f = [.s.pgagroal.2274] in both scenarios.
Can you try to adjust this and run it?
#!/bin/sh
pgagroal &
sleep 5
i=0
max_connections=$( pgagroal-cli conf get max_connections )
clients=$(( max_connections / 2 ))
port=$( pgagroal-cli conf get port )
echo "MAX_CONNECTIONS = $max_connections CLIENTS = $clients"
while [ $i -lt 5 ]; do
/usr/pgsql-16/bin/pgbench -c $clients -l -n -T 60 -h localhost -p $port -U pgbench pgbench &
sleep 12
killall -9 pgbench
connections_so_far=$( psql -At -U postgres -c 'select count(*) from pg_stat_activity where datname = current_database();' pgbench )
echo "CONNECTIONS ARE $connections_so_far"
if [ $connections_so_far -ge $max_connections ]; then
pgagroal-cli shutdown
exit
fi
done
I'm guessing it's because it's a different signal you're sending. -9 won't allow pgbench to signal that it will terminate so there is no way for pgagroal to know that if finished other than waiting for timeout.
That WARN is actually a different issue. That is a path that was fixed in the new io patch. So that would be the correct behavior. Something is not working properly. I would have to revisit the messages exchanged in the main PR or this issue to remember.
I'm guessing it's because it's a different signal you're sending. -9 won't allow pgbench to signal that it will terminate so there is no way for pgagroal to know that if finished other than waiting for timeout.
Yeg, but using killall -INT has the effect of filling the pgagroal pool of connections, so you are right that PostgreSQL will receive the right signal, but something goes wrong within the pool.