pgx
pgx copied to clipboard
pgx hangs when trying to acquire a connection from the pool
Hi,
I am using pgxpool to manage DB connections to a PostgreSQL instance and have observed a very strange behavior after running a simple query on the DB: pgx is not able to acquire a connection from the pool; it gets stuck while waiting for DB data to be sent (I left running for 2 hours and no connection was acquired). Here is the call stack collected using pprof:
# 0x477d5b internal/poll.runtime_pollWait+0x3b /usr/local/go/src/runtime/netpoll.go:343
# 0x53d27d internal/poll.(*pollDesc).wait+0x7d /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x53d310 internal/poll.(*pollDesc).waitRead+0x30 /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
# 0x53edce internal/poll.(*FD).Read+0x40e /usr/local/go/src/internal/poll/fd_unix.go:164
# 0x72be72 net.(*netFD).Read+0x72 /usr/local/go/src/net/fd_posix.go:55
# 0x74f6fc net.(*conn).Read+0x9c /usr/local/go/src/net/net.go:179
# 0x7e1721 crypto/tls.(*atLeastReader).Read+0xc1 /usr/local/go/src/crypto/tls/conn.go:805
# 0x57a41b bytes.(*Buffer).ReadFrom+0x13b /usr/local/go/src/bytes/buffer.go:211
# 0x7e1a48 crypto/tls.(*Conn).readFromUntil+0x168 /usr/local/go/src/crypto/tls/conn.go:827
# 0x7df8d8 crypto/tls.(*Conn).readRecordOrCCS+0x258 /usr/local/go/src/crypto/tls/conn.go:625
# 0x7df5e4 crypto/tls.(*Conn).readRecord+0x24 /usr/local/go/src/crypto/tls/conn.go:587
# 0x7e6629 crypto/tls.(*Conn).Read+0x1e9 /usr/local/go/src/crypto/tls/conn.go:1369
# 0xfd158a github.com/jackc/pgx/v5/pgconn/internal/bgreader.(*BGReader).Read+0x22a /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/internal/bgreader/bgreader.go:100
# 0x4eee67 io.ReadAtLeast+0x167 /usr/local/go/src/io/io.go:335
# 0xf978b8 github.com/jackc/pgx/v5/pgproto3.(*chunkReader).Next+0x6f8 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgproto3/chunkreader.go:80
# 0xfa2f0a github.com/jackc/pgx/v5/pgproto3.(*Frontend).Receive+0x8a /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgproto3/frontend.go:220
# 0xfe8cf0 github.com/jackc/pgx/v5/pgconn.(*PgConn).peekMessage+0x2d0 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:524
# 0xfe8f75 github.com/jackc/pgx/v5/pgconn.(*PgConn).receiveMessage+0x55 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:544
# 0xff1319 github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).receiveMessage+0x59 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1413
# 0xff1e06 github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).Close+0x46 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1481
# 0xff4a6e github.com/jackc/pgx/v5/pgconn.(*PgConn).Ping+0x4e /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1779
# 0x11b7a04 github.com/jackc/pgx/v5.(*Conn).Ping+0x44 /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/conn.go:421
# 0x11d6e2d github.com/jackc/pgx/v5/pgxpool.(*Pool).Acquire+0x18d /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:511
# 0x11d7b3a github.com/jackc/pgx/v5/pgxpool.(*Pool).Query+0x9a /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:603
# 0x1a47abd github.com/layr-labs/eigenlayer-backend/common/beacon.(*BeaconDataServiceZettablockImpl).GetEigenpodValidators+0x2fd /home/eigenlabs/repos/eigenlayer-backend/common/beacon/zettablock.go:308
# 0x1c06ed5 github.com/layr-labs/eigenlayer-backend/webserver/services.(*withdrawalProofServiceImpl).GetRestakeProofs+0x355 /home/eigenlabs/repos/eigenlayer-backend/webserver/services/withdrawal_proof_service.go:173
When debugging this, I assumed there was some kind of leak that prevented a connection to be acquired, but that does not seem to be the case since there are idle resources (see below). I confirmed with the DB provider that the connection is in idle state - info obtained from pg_stat_activity.
Pool config and stats:
MaxConns = 20
MinConns = 10
&pgxpool.Stat{
s: &puddle.Stat{
constructingResources: 0,
acquiredResources: 1,
idleResources: 9,
maxResources: 20,
acquireCount: 12,
acquireDuration: 1224273515,
emptyAcquireCount: 1,
canceledAcquireCount: 0,
},
newConnsCount: 31,
lifetimeDestroyCount: 14,
idleDestroyCount: 0,
}
General info:
pgxversion 5.5.2.- go 1.21
I can reproduce this 100% against the DB my company uses; I haven't tested on a local PostgreSQL yet. Could you help on this? Maybe it is in fact a bug. Thank you.
DB provider pointed me to this AWS doc:
Idle client connection A connection to an Aurora PostgreSQL DB instance is in idle in transaction state and is waiting for a client to send more data or issue a command. This state can lead to an increase in Client:ClientRead events.
Have you found a solution? We experience a similar problem.
Have you found a solution? We experience a similar problem.