pgx icon indicating copy to clipboard operation
pgx copied to clipboard

pgx hangs when trying to acquire a connection from the pool

Open mpjunior92 opened this issue 1 year ago • 3 comments

Hi,

I am using pgxpool to manage DB connections to a PostgreSQL instance and have observed a very strange behavior after running a simple query on the DB: pgx is not able to acquire a connection from the pool; it gets stuck while waiting for DB data to be sent (I left running for 2 hours and no connection was acquired). Here is the call stack collected using pprof:

#	0x477d5b	internal/poll.runtime_pollWait+0x3b											/usr/local/go/src/runtime/netpoll.go:343
#	0x53d27d	internal/poll.(*pollDesc).wait+0x7d											/usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#	0x53d310	internal/poll.(*pollDesc).waitRead+0x30											/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#	0x53edce	internal/poll.(*FD).Read+0x40e												/usr/local/go/src/internal/poll/fd_unix.go:164
#	0x72be72	net.(*netFD).Read+0x72													/usr/local/go/src/net/fd_posix.go:55
#	0x74f6fc	net.(*conn).Read+0x9c													/usr/local/go/src/net/net.go:179
#	0x7e1721	crypto/tls.(*atLeastReader).Read+0xc1											/usr/local/go/src/crypto/tls/conn.go:805
#	0x57a41b	bytes.(*Buffer).ReadFrom+0x13b												/usr/local/go/src/bytes/buffer.go:211
#	0x7e1a48	crypto/tls.(*Conn).readFromUntil+0x168											/usr/local/go/src/crypto/tls/conn.go:827
#	0x7df8d8	crypto/tls.(*Conn).readRecordOrCCS+0x258										/usr/local/go/src/crypto/tls/conn.go:625
#	0x7df5e4	crypto/tls.(*Conn).readRecord+0x24											/usr/local/go/src/crypto/tls/conn.go:587
#	0x7e6629	crypto/tls.(*Conn).Read+0x1e9												/usr/local/go/src/crypto/tls/conn.go:1369
#	0xfd158a	github.com/jackc/pgx/v5/pgconn/internal/bgreader.(*BGReader).Read+0x22a							/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/internal/bgreader/bgreader.go:100
#	0x4eee67	io.ReadAtLeast+0x167													/usr/local/go/src/io/io.go:335
#	0xf978b8	github.com/jackc/pgx/v5/pgproto3.(*chunkReader).Next+0x6f8								/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgproto3/chunkreader.go:80
#	0xfa2f0a	github.com/jackc/pgx/v5/pgproto3.(*Frontend).Receive+0x8a								/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgproto3/frontend.go:220
#	0xfe8cf0	github.com/jackc/pgx/v5/pgconn.(*PgConn).peekMessage+0x2d0								/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:524
#	0xfe8f75	github.com/jackc/pgx/v5/pgconn.(*PgConn).receiveMessage+0x55								/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:544
#	0xff1319	github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).receiveMessage+0x59							/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1413
#	0xff1e06	github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).Close+0x46								/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1481
#	0xff4a6e	github.com/jackc/pgx/v5/pgconn.(*PgConn).Ping+0x4e									/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1779
#	0x11b7a04	github.com/jackc/pgx/v5.(*Conn).Ping+0x44										/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/conn.go:421
#	0x11d6e2d	github.com/jackc/pgx/v5/pgxpool.(*Pool).Acquire+0x18d									/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:511
#	0x11d7b3a	github.com/jackc/pgx/v5/pgxpool.(*Pool).Query+0x9a									/home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:603
#	0x1a47abd	github.com/layr-labs/eigenlayer-backend/common/beacon.(*BeaconDataServiceZettablockImpl).GetEigenpodValidators+0x2fd	/home/eigenlabs/repos/eigenlayer-backend/common/beacon/zettablock.go:308
#	0x1c06ed5	github.com/layr-labs/eigenlayer-backend/webserver/services.(*withdrawalProofServiceImpl).GetRestakeProofs+0x355		/home/eigenlabs/repos/eigenlayer-backend/webserver/services/withdrawal_proof_service.go:173

When debugging this, I assumed there was some kind of leak that prevented a connection to be acquired, but that does not seem to be the case since there are idle resources (see below). I confirmed with the DB provider that the connection is in idle state - info obtained from pg_stat_activity.

Pool config and stats:

MaxConns = 20
MinConns = 10

&pgxpool.Stat{
  s: &puddle.Stat{
    constructingResources: 0,
    acquiredResources:     1,
    idleResources:         9,
    maxResources:          20,
    acquireCount:          12,
    acquireDuration:       1224273515,
    emptyAcquireCount:     1,
    canceledAcquireCount:  0,
  },
  newConnsCount:        31,
  lifetimeDestroyCount: 14,
  idleDestroyCount:     0,
}

General info:

  • pgx version 5.5.2.
  • go 1.21

I can reproduce this 100% against the DB my company uses; I haven't tested on a local PostgreSQL yet. Could you help on this? Maybe it is in fact a bug. Thank you.

mpjunior92 avatar Jan 30 '24 23:01 mpjunior92

DB provider pointed me to this AWS doc:

Idle client connection A connection to an Aurora PostgreSQL DB instance is in idle in transaction state and is waiting for a client to send more data or issue a command. This state can lead to an increase in Client:ClientRead events.

mpjunior92 avatar Jan 31 '24 11:01 mpjunior92

Have you found a solution? We experience a similar problem.

davidspiess avatar May 07 '24 06:05 davidspiess

Have you found a solution? We experience a similar problem.

xakepp35 avatar Jul 10 '24 10:07 xakepp35