pgx icon indicating copy to clipboard operation
pgx copied to clipboard

Crash with non-set pipeline.conn inside sendBatchExtendedWithDescription() when doing SendBatch()

Open johnsiilver opened this issue 2 years ago • 9 comments

Describe the bug

I have two parts of my codebase that crash when I use a transaction and call SendBatch(). The transaction comes from a pgxpool.Pool.

One is a query and another is a batch insert.

The basics are that I am doing a:

batch := &&pgx.Batch{}
// Pretend I added queries to the Batch here.

if batch.Len() > 0 {
  Tx.SendBatch(ctx, batch) // I know Tx is not nil and Tx.Conn() also returns non-nil
}

This causes a panic (in both versions 4 and 5):

runtime error: invalid memory address or nil pointer dereference.

github.com/jackc/pgx.(*Conn).sendBatchExtendedWithDescription.func1() github.com/jackc/pgx/conn.go:1067 +0x1f panic({0xa589c0, 0x10042e0}) /usr/local/go/src/runtime/panic.go:884 +0x213 github.com/jackc/pgx/pgconn.(*Pipeline).Sync(0xc0181a5900) github.com/jackc/pgx/pgconn/pgconn.go:1973 +0x1c github.com/jackc/pgx.(*Conn).sendBatchExtendedWithDescription(0xc0006d7200, {0xc05790?, 0xc00012c000}, 0xc0424e55d8, {0xc00d41e110, 0x1, 0x463685?}, {0xc07fb0, 0xc00b927bf0}) github.com/jackc/pgx/conn.go:1078 +0x6d5 github.com/jackc/pgx.(*Conn).sendBatchQueryExecModeCacheStatement(0xc0006d7200, {0xc05790?, 0xc00012c000}, 0xc0424e55d8) github.com/jackc/pgx/conn.go:1008 +0x42c github.com/jackc/pgx.(*Conn).SendBatch(0xc0006d7200, {0xc05790?, 0xc00012c000?}, 0xc0424e55d8) github.com/jackc/pgx/conn.go:913 +0x5d9 github.com/jackc/pgx.(*dbTx).SendBatch(0xc000134050?, {0xc05790?, 0xc00012c000?}, 0xc0003eed20?) github.com/jackc/pgx/tx.go:266 +0x76 github.com/jackc/pgx/pgxpool.(*Tx).SendBatch(0xa52c60?, {0xc05790?, 0xc00012c000?}, 0xb?) github.com/jackc/pgx/pgxpool/tx.go:50 +0x2b

Looking into that code, it says it is in the defer statement. But that's not really the case, that is a second panic caused by the first panic that gets hidden by the defer.

if len(distinctNewQueries) > 0 {
   for _, sd := range distinctNewQueries {
     pipeline.SendPrepare(sd.Name, sd.SQL, nil)  // Real panic is here
}

In the code above, SendPrepare() panics, because pipeline.conn is nil. Because the panic still causes defers to happen, the defer goes off, but pbr was never set with a return, so pbr.Err causes another nil pointer dereference inside the defer statement and hides the first panic.

I'm thinking the defer might change to the following to prevent the "panic hiding" that I am seeing:

defer func() {
  if pbr != nil && pbr.err != nil {
    pipeline.Close()
  }
}()

With that said, I know that Tx is not nil, but somehow, somewhere the pipeline.conn doesn't get set. I also called Tx.Conn() right before I do the .SendBatch(), and it is not nil either.

I've run with -race, and nothing weird there.

I'm hoping maybe someone knows what bone headed thing I'm doing that would cause that.

To Reproduce

I don't have a reproduction, the code base this is failing in is too big to put here. In essence, the code above is what I'm doing, which means I know I'm doing something terribly wrong, but just failing to see it, because that is the simplest of use cases.

Version

  • Go: go version go1.20 linux/amd64
  • PostgreSQL: 14.7
  • pgx: 5.4.2 , but also same problem with 4.x

Additional context I want to take the opportunity to say thank you so much for this project. The level of work that has gone into this is quite amazing and I just want to say thank you.

johnsiilver avatar Jul 15 '23 05:07 johnsiilver

The nil check in the defer is definitely needed in case of panic. I added that in ef9e26a5d5a6ae7f6f8dd8570e181c08f114de9c.

Beyond that, it's hard to say for sure what's going on, but I have a couple thoughts.

  1. The regular query path is less complicated than the batch path. So I would suggest investigating there first.
  2. My initial guess would be using some sort of resource after it's been released. e.g. a Tx after it's been committed or rolled back or a pool Conn after it's been released. Some places have checks to give a friendly error instead of a panic, but maybe not all.

jackc avatar Jul 15 '23 15:07 jackc

Thanks Jack.

I'll try converting to a query instead of a batch and see if it tells me anything. The #2, makes sense to me. I can't see where that would happen, but that doesn't mean it isn't. Is there anything that can be called to tell if Tx has been committed or a pool.Conn has been released, that way I can do a check before these batches to see what they say?

johnsiilver avatar Jul 15 '23 15:07 johnsiilver

Is there anything that can be called to tell if Tx has been committed or a pool.Conn has been released, that way I can do a check before these batches to see what they say?

Sorry, there is not. Though it looks like the main Tx implementation will return ErrTxClosed if you try to use it. So it's probably not the Tx. It doesn't look like there are checks on the pool.Conn through. If the error occurs consistently then stepping through with the debugger might be the simplest approach.

jackc avatar Jul 15 '23 16:07 jackc

I'll try converting to a query instead of a batch and see if it tells me anything.

Hello @johnsiilver, did the above solve your problem? I have kinda the same issue :smile:

tejo avatar Dec 11 '23 13:12 tejo

I've seen this crash on pgx v5.0.4 just once (-ish) (and we do a lot of batch queries), and it seemed to coincide with PostgreSQL connection failing. Not investigated in detail yet.

WGH- avatar Feb 27 '24 21:02 WGH-

This may be related to #1920 which should be fixed in an upcoming release.

jackc avatar Mar 02 '24 19:03 jackc

Hey folks, I believe I was able to replicate this as well but with a transaction which is using pgxpool. Stack trace:

"close of closed channel"
Stack:
	 3  0x0000000000aa10b8 in github.com/jackc/pgx/v5/pgconn.(*PgConn).receiveMessage at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:553
	 4  0x0000000000aa8305 in github.com/jackc/pgx/v5/pgconn.(*PgConn).CopyFrom at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgconn/pgconn.go:1336
	 5  0x0000000000c8adfb in github.com/jackc/pgx/v5.(*copyFrom).run at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/copy_from.go:202
	 6  0x0000000000c8c0e5 in github.com/jackc/pgx/v5.(*Conn).CopyFrom at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/copy_from.go:275
	 7  0x0000000000c9384f in github.com/jackc/pgx/v5.(*dbTx).CopyFrom at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/tx.go:256
	 8  0x0000000000ca0290 in github.com/jackc/pgx/v5/pgxpool.(*Tx).CopyFrom at /root/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/tx.go:46
	 9  ...

The connection was via pgbouncer and the underlying master switched. So the following error was produced:

*pgproto3.ErrorResponse {Severity: "FATAL", SeverityUnlocalized: "", Code: "08P01", Message: "server conn crashed?", Detail: "", Hint: "", Position: 0, InternalPosition: 0, InternalQuery: "", Where: "", SchemaName: "", TableName: "", ColumnName: "", DataTypeName: "", ConstraintName: "", File: "", Line: 0, Routine: "", UnknownFields: map[uint8]string nil}

Which led to the above close of closes.

jaitaiwan avatar Apr 15 '24 02:04 jaitaiwan

@jaitaiwan You are on v5.5.1. If you have the ability to reproduce the error can you upgrade to v5.5.5? There is a change that should have fixed this issue.

jackc avatar May 15 '24 01:05 jackc