pgx
pgx copied to clipboard
unexpected EOF error
Hi 👋
First of all thank you for the amazing work here 🙌
Recently we migrate from pq driver to pgx and faced one issue that I would like to share with you
We started some random error in our logs that matchs with io.ErrUnexpectedEOF which we track down and it comes from here:
https://github.com/jackc/pgx/blob/e58381ac9409172a674f76c1801c3c606ac41b42/pgproto3/frontend.go#L210-L215
This previously was hidden from us because pq does this:
if v == io.EOF || v.Error() == "remote error: handshake failure" {
*err = driver.ErrBadConn
} else {
*err = v
}
That is retried by database/sql (getting/opening another connection):
for i := 0; i < maxBadConnRetries; i++ {
stmt, err = db.prepare(ctx, query, cachedOrNewConn)
isBadConn = errors.Is(err, driver.ErrBadConn)
if !isBadConn {
break
}
}
if isBadConn {
return db.prepare(ctx, query, alwaysNewConn)
}
Makes sense wrapping the io.EOF into a driver.ErrBadConn ?
This has the advantage that is out-of-the-box retried by the standard database/sql but maybe i'm missing another use case for that error
Another approach could be some migration guide (from pq to pgx) with a warning about this error
Makes sense wrapping the io.EOF into a driver.ErrBadConn ? This has the advantage that is out-of-the-box retried by the standard database/sql but maybe i'm missing another use case for that error
I think that's reasonable. It would need to be done in the stdlib package instead of pgproto3 to avoid adding the database/sql dependency to that package. That would be tedious though. That error could happen nearly anywhere, so every error handling path would need to add the error wrapping / translation.
Another approach could be some migration guide (from pq to pgx) with a warning about this error
I'm sure some people would find a pq to pgx migration guide useful, but it's not something I can personally do. I've never used pq in any significant way.
I think that's reasonable. It would need to be done in the stdlib package instead of pgproto3 to avoid adding the database/sql dependency to that package. That would be tedious though. That error could happen nearly anywhere, so every error handling path would need to add the error wrapping / translation.
Ok, I'm going to try this and make it as painless as possible 🙃
I'm sure some people would find a pq to pgx migration guide useful, but it's not something I can personally do. I've never used pq in any significant way.
Without this error, if you're using database/sql is pretty straightforward
@picollomartin please give a try to this (untested) patch
0001-proof-of-concept-for-1435.zip
ExecContext and QueryContext already have a SafeToRetry check, so I didn't add the error translation there
Hey @drakkan ! 👋 Sorry for the delay
Thank you for that patch 🙌
I tested with some local script but got the same error exactly in QueryContext method, SafeToRetry does not include this error because unexpected EOF does not implement the following interface:
// SafeToRetry checks if the err is guaranteed to have occurred before sending any data to the server.
func SafeToRetry(err error) bool {
if e, ok := err.(interface{ SafeToRetry() bool }); ok {
return e.SafeToRetry()
}
return false
}
Like @jackc mentioned earlier i think that this translation of error should be nearly everywhere including QueryContext/ExecContext
Hello @picollomartin
ok, please give a try to this variant
0001-proof-of-concept-for-1435.zip
if you confirm it works I'll open a proper PR, thanks
Not sure if I have to translate also this error:
https://github.com/jackc/pgx/blob/74f9b9f0a483f95513c621364f2c3912181ee360/stdlib/sql.go#L730
and maybe BeforeConnect
https://github.com/jackc/pgx/blob/74f9b9f0a483f95513c621364f2c3912181ee360/stdlib/sql.go#L200
and AfterConnect
https://github.com/jackc/pgx/blob/74f9b9f0a483f95513c621364f2c3912181ee360/stdlib/sql.go#L208
should not be translated (the current patch checks also these errors)
@jackc based on the lib/pq code here, it seems we have to store ErrBadConn and add a logic like this. What do you think about?
@drakkan Not really sure about the context. It'd be easier to review as a draft PR.
@drakkan Sorry, my original response was inaccurate / incomplete. Your mention of lib/pq just reminded me of why pgx is very careful about returning ErrBadConn.
From the database/sql/driver package docs:
ErrBadConn should be returned by a driver to signal to the sql package that a driver.Conn is in a bad state (such as the server having earlier closed the connection) and the sql package should retry on a new connection.
To prevent duplicate operations, ErrBadConn should NOT be returned if there's a possibility that the database server might have performed the operation. Even if the server sends back an error, you shouldn't return ErrBadConn.
lib/pq violates the spec and risks duplicate operations. See https://github.com/lib/pq/issues/939.
I think stdlib is already returning ErrBadConn where everywhere it can, but if there are any other sites you think should be then we need to be very sure that it only happens when there is no change the server performed the operation.
@drakkan your last patch works for me locally
Taking you last comment @jackc i saw the issue of pq and is a pretty risky approach, idk if io.ErrUnexpectedEOF is in the same group of possibly of duplicating operations if retried, if that's the case i agree in not to do any risky translation like pq does and maybe just clear that up in docs (network errors (like io.ErrUnexpectedEOF) are risky to retry because can lead to duplication of some operations, if you see this error frequently check for network issues between the driver and the database (e.g client timeout in a proxy))
That maybe helps to close other issues related to this error like #966, #1238
@picollomartin thanks for your tests.
@jackc you are right. my patch can trigger duplicate inserts, this test case
func TestNetworErrors(t *testing.T) {
testWithAllQueryExecModes(t, func(t *testing.T, db *sql.DB) {
_, err := db.Exec("drop table if exists t")
require.NoError(t, err)
_, err = db.Exec("create table t(id serial)")
require.NoError(t, err)
_, err = db.Exec(`
begin;
insert into t(id) values (default);
commit;
select pg_terminate_backend(pg_backend_pid()) where 1 = (select count(*) from t);
`)
require.Error(t, err)
var numInserts int
err = db.QueryRow("select count(*) from t").Scan(&numInserts)
require.NoError(t, err)
require.Equal(t, 1, numInserts)
})
}
works as expected but if I change transalteError to also catch SQLSTATE 57P01 I get a duplicate insert. Thanks for your work!
Can y'all please link the patch in this issue? I see it mentioned a few times, but no link to it so we can follow when this fix is or has been merged...