mysql: "packets.go:144: bad connection"
Is there a reason why Ping is called when starting a new transaction?
NewDatabaseTx clones db with checkConn = true
https://github.com/upper/db/blob/ca469a8217b7d6a4689fba26f395e22d910da653/mysql/database.go#L219-L220
So Ping is called before begin transaction:
https://github.com/upper/db/blob/ca469a8217b7d6a4689fba26f395e22d910da653/internal/sqladapter/database.go#L330-L334
The problem is that Ping is not retried if ping returns driver.BadConnection:
https://github.com/golang/go/blob/3bf9b77c0c8f123728fcfc802599232e9bd95476/src/database/sql/sql.go#L724-L741
Whereas begin transaction retries:
https://github.com/golang/go/blob/3bf9b77c0c8f123728fcfc802599232e9bd95476/src/database/sql/sql.go#L1619-L1632
We have a problem if a connection is idle for a long time and we start a transaction, it fails(because of a timeout). Whereas calling Query() or Exec() in the same situation works fine.
@xiam would you have a second to respond, please?
Great topic @iliacimpoes, I remember that driver.BadConnection was something that was difficult for the connection pool to recover from, the Ping here prevented driver.BadConnection errors.
We have a problem if a connection is idle for a long time and we start a transaction, it fails(because of a timeout). Whereas calling Query() or Exec() in the same situation works fine.
If that's your case I think we can simulate it locally and see if this is still required or not.
Would be great. I can share a stack trace starting from the moment mysql driver returns ErrBadConnection:
[mysql] 2018/10/18 05:30:25 packets.go:144: bad connection cause: write tcp xx.xx.xx.xx:xxxxx->xx.xx.xx.xx:xxxxx: write: connection timed out
Stack: goroutine 43902 [running]:
runtime/debug.Stack(0xc0000869d0, 0xc0014a4000, 0x5)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).writePacket(0xc0005712c0, 0xc0014a4000, 0x5, 0x1000, 0x8, 0xc00075e940)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/packets.go:143 +0x332
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).writeCommandPacket(0xc0005712c0, 0xd6b30e, 0xc0005712c0, 0x0)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/packets.go:420 +0x6e
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).Ping(0xc0005712c0, 0xe61520, 0xc000082010, 0x0, 0x0)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/connection_go18.go:32 +0x1ab
database/sql.(*DB).pingDC.func1()
/usr/local/go/src/database/sql/sql.go:717 +0x4a
database/sql.withLock(0xe5bf20, 0xc0014abf80, 0xc000ee2130)
/usr/local/go/src/database/sql/sql.go:3074 +0x63
database/sql.(*DB).pingDC(0xc00010e0c0, 0xe61520, 0xc000082010, 0xc0014abf80, 0xc000ee21e0, 0x0, 0x0)
/usr/local/go/src/database/sql/sql.go:716 +0x11e
database/sql.(*DB).PingContext(0xc00010e0c0, 0xe61520, 0xc000082010, 0x951c17, 0xd2b360)
/usr/local/go/src/database/sql/sql.go:743 +0x17f
database/sql.(*DB).Ping(0xc00010e0c0, 0xc0007b3e60, 0xc0001424d0)
/usr/local/go/src/database/sql/sql.go:749 +0x43
.../vendor/upper.io/db.v3/internal/sqladapter.(*database).Ping(0xc0002f8f20, 0xc000069180, 0xe6c180)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/database.go:278 +0x34
.../vendor/upper.io/db.v3/internal/sqladapter.(*database).NewClone(0xc0001d5b80, 0xe6b4a0, 0xc000069180, 0x403501, 0xe50770, 0xe50788, 0x449768, 0x8f96c750d8d9)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/database.go:331 +0x11a
.../vendor/upper.io/db.v3/mysql.(*database).clone(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x1503101, 0xc000142460, 0xc0007b3b00, 0xc000ee23f0)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:144 +0xb0
.../vendor/upper.io/db.v3/mysql.(*database).NewDatabaseTx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x0, 0x0, 0x0, 0x0)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:220 +0x58
.../vendor/upper.io/db.v3/mysql.(*database).NewTx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x70, 0xc000142460, 0x0, 0xc0002cc700)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:86 +0x4c
.../vendor/upper.io/db.v3/internal/sqladapter.RunTx(0xe6ffa0, 0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0xc000142460, 0x0, 0x0)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/tx.go:101 +0x5a
.../vendor/upper.io/db.v3/mysql.(*database).Tx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0xc000142460, 0xe615a0, 0xc0007b3d10)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:215 +0x55
The only way I have managed to reproduce this issue is by using this code:
func TestIssue469_BadConnection(t *testing.T) {
sess := mustOpen()
// Ask the MySQL server to disconnect sessions that remain inactive for more than 1 second.
_, err := sess.Exec(`SET SESSION wait_timeout=1`)
assert.NoError(t, err)
defer sess.Close()
err = sess.Tx(nil, func(sess sqlbuilder.Tx) error {
var err error
// Remain inactive for 2 seconds.
time.Sleep(time.Second * 2)
// Attempt to count elements from the "artist" collection.
_, err = sess.Collection("artist").Find().Count()
if err != nil {
return err
}
return nil
})
// driver: bad connection (which is OK)
assert.NoError(t, err)
}
See: https://github.com/upper/db/compare/issue-469_mysql_newtx_and_bad_connection#diff-fae74f0089799e579d35b5ab68f33c86R243
The case we see above is MySQL disconnecting a client in the middle of a transaction, which is something the client won't be able to recover from, since the transaction is already invalid by that point. Is your case different from the above example? Do you know how can we reproduce your case?
I added more test cases here: https://github.com/upper/db/pull/472 I hope one of them solves your problem. We won't remove the Ping() from NewClone yet, maybe in the future we'll use it only in adapters that fail to reestablish a connection by themselves.
@xiam Thank you for the test and for:
if err := nd.Ping(); err != nil {
// Retry once if ping fails.
return d.NewClone(p, false)
}
I was able to reproduce the error by slightly modifying your test case:
func TestIssue469_BadConnection(t *testing.T) {
var err error
sess := mustOpen()
defer sess.Close()
// Fill up connections pool to have some idle connections
wg := sync.WaitGroup{}
for i := 0; i < 30; i++ {
wg.Add(1)
go func() {
defer wg.Done()
_, err = sess.Exec(`SET SESSION wait_timeout=1`)
assert.NoError(t, err)
}()
}
wg.Wait()
time.Sleep(time.Second * 2)
_, err = sess.Exec("update artist set name = ?", "blah")
if err != nil {
fmt.Println(time.Now(), "Exec", err)
}
defer func() {
fmt.Println("Done", time.Now())
}()
sleepTime := 120 * time.Minute
for i := 0; i < 100; i++ {
time.Sleep(sleepTime)
// At this point the server should have disconnected us. Let's try to create
// a transaction anyway.
err = sess.Tx(nil, func(sess sqlbuilder.Tx) error {
fmt.Println(time.Now(), "transaction started")
_, err = sess.Collection("artist").Find().Count()
return err
})
if err == nil {
fmt.Println(time.Now(), "Transaction NO error")
break
}
fmt.Println(time.Now(), "Transaction error", err)
if err.Error() == "bad connection" {
sleepTime = 1 * time.Second
}
}
for i := 0; i < 100; i++ {
time.Sleep(1 * time.Second)
_, err = sess.Exec("update artist set name = ?", "blah")
if err != nil {
fmt.Println(time.Now(), "Exec", err)
} else {
fmt.Println(time.Now(), "Exec - no error")
return
}
}
}
- We need to fill pool of idle connections
- When a connection is closed by wait_timeout
mysql.ErrInvalidConnis returned, but after some time - about 2 hours I start gettingbad connectionerrors. Unfortunately I have no idea what those 2 hours are and how to reduce this time. - I also added some printlns for debugging
So I got these results:
$ make test-adapter-mysql
/Library/Developer/CommandLineTools/usr/bin/make -C mysql test || exit 1;
#go test -tags generated -v -race # race: limit on 8192 simultaneously alive goroutines is exceeded, dying
go test -timeout 120h -tags generated -v -run='TestIssue469'
=== RUN TestIssue469_BadConnection
[mysql] 2018/12/19 10:10:21 packets.go:41: unexpected EOF
2018-12-19 10:10:21.808442 +0200 EET m=+2.178825777 Exec invalid connection
[mysql] 2018/12/19 12:10:11 packets.go:41: unexpected EOF
2018-12-19 12:10:11.702285 +0200 EET m=+7202.180542781 ping invalid connection
[mysql] 2018/12/19 12:10:11 packets.go:41: unexpected EOF
2018-12-19 12:10:11.702589 +0200 EET m=+7202.180846408 BeginTx invalid connection
2018-12-19 12:10:11.702622 +0200 EET m=+7202.180879499 Transaction error invalid connection
2018-12-19 14:10:11.571663 +0200 EET m=+14402.187703314 ping bad connection
2018-12-19 14:10:11.573195 +0200 EET m=+14402.189235727 Transaction error bad connection
2018-12-19 14:10:12.575273 +0200 EET m=+14403.191313594 ping bad connection
2018-12-19 14:10:12.577907 +0200 EET m=+14403.193946834 transaction started
2018-12-19 14:10:12.583669 +0200 EET m=+14403.199708998 Transaction NO error
2018-12-19 14:10:13.587812 +0200 EET m=+14404.203801715 Exec - no error
Done 2018-12-19 14:10:13.58784 +0200 EET m=+14404.203828780
Then I changed:
if err := nd.Ping(); err != nil {
// Retry once if ping fails.
return d.NewClone(p, false)
}
back to:
if err := nd.Ping(); err != nil {
// Retry once if ping fails.
return nil, err
}
I got much worse results:
$ make test-adapter-mysql
/Library/Developer/CommandLineTools/usr/bin/make -C mysql test || exit 1;
#go test -tags generated -v -race # race: limit on 8192 simultaneously alive goroutines is exceeded, dying
go test -timeout 120h -tags generated -v -run='TestIssue469'
=== RUN TestIssue469_BadConnection
[mysql] 2018/12/19 17:16:22 packets.go:41: unexpected EOF
2018-12-19 17:16:22.206053 +0200 EET m=+2.202288638 Exec invalid connection
[mysql] 2018/12/19 19:16:21 packets.go:41: unexpected EOF
2018-12-19 19:16:21.405881 +0200 EET m=+7202.207819524 ping invalid connection
2018-12-19 19:16:21.405963 +0200 EET m=+7202.207901179 d.clone invalid connection
2018-12-19 19:16:21.405982 +0200 EET m=+7202.207920744 Transaction error invalid connection
2018-12-19 21:16:21.614088 +0200 EET m=+14402.210499100 ping bad connection
2018-12-19 21:16:21.614123 +0200 EET m=+14402.210533943 d.clone bad connection
2018-12-19 21:16:21.614136 +0200 EET m=+14402.210547179 Transaction error bad connection
2018-12-19 21:16:22.62191 +0200 EET m=+14403.218273233 ping bad connection
2018-12-19 21:16:22.621984 +0200 EET m=+14403.218347729 d.clone bad connection
2018-12-19 21:16:22.621994 +0200 EET m=+14403.218357019 Transaction error bad connection
2018-12-19 21:16:23.628822 +0200 EET m=+14404.225137068 ping bad connection
2018-12-19 21:16:23.628867 +0200 EET m=+14404.225182426 d.clone bad connection
2018-12-19 21:16:23.628878 +0200 EET m=+14404.225193410 Transaction error bad connection
2018-12-19 21:16:24.630636 +0200 EET m=+14405.226951666 ping bad connection
2018-12-19 21:16:24.630675 +0200 EET m=+14405.226990692 d.clone bad connection
2018-12-19 21:16:24.630686 +0200 EET m=+14405.227000991 Transaction error bad connection
2018-12-19 21:16:25.640003 +0200 EET m=+14406.236270846 ping bad connection
2018-12-19 21:16:25.640038 +0200 EET m=+14406.236305448 d.clone bad connection
2018-12-19 21:16:25.640049 +0200 EET m=+14406.236316529 Transaction error bad connection
2018-12-19 21:16:26.64742 +0200 EET m=+14407.243687404 ping bad connection
2018-12-19 21:16:26.647465 +0200 EET m=+14407.243732695 d.clone bad connection
2018-12-19 21:16:26.647475 +0200 EET m=+14407.243742499 Transaction error bad connection
2018-12-19 21:16:27.648469 +0200 EET m=+14408.244688532 ping bad connection
2018-12-19 21:16:27.648524 +0200 EET m=+14408.244743583 d.clone bad connection
2018-12-19 21:16:27.648532 +0200 EET m=+14408.244751233 Transaction error bad connection
2018-12-19 21:16:28.658848 +0200 EET m=+14409.255019566 ping bad connection
2018-12-19 21:16:28.65889 +0200 EET m=+14409.255061484 d.clone bad connection
2018-12-19 21:16:28.658899 +0200 EET m=+14409.255070778 Transaction error bad connection
2018-12-19 21:16:29.670281 +0200 EET m=+14410.266452400 Transaction error bad connection
2018-12-19 21:16:30.679979 +0200 EET m=+14411.276102088 ping bad connection
2018-12-19 21:16:30.680012 +0200 EET m=+14411.276135474 d.clone bad connection
2018-12-19 21:16:30.680024 +0200 EET m=+14411.276147535 Transaction error bad connection
2018-12-19 21:16:31.690187 +0200 EET m=+14412.286310003 ping bad connection
2018-12-19 21:16:31.690229 +0200 EET m=+14412.286352775 d.clone bad connection
2018-12-19 21:16:31.690243 +0200 EET m=+14412.286366043 Transaction error bad connection
2018-12-19 21:16:32.69659 +0200 EET m=+14413.292665402 ping bad connection
2018-12-19 21:16:32.696637 +0200 EET m=+14413.292712780 d.clone bad connection
2018-12-19 21:16:32.69665 +0200 EET m=+14413.292725414 Transaction error bad connection
2018-12-19 21:16:33.706352 +0200 EET m=+14414.302379212 ping bad connection
2018-12-19 21:16:33.706404 +0200 EET m=+14414.302431774 d.clone bad connection
2018-12-19 21:16:33.706417 +0200 EET m=+14414.302444713 Transaction error bad connection
2018-12-19 21:16:34.713427 +0200 EET m=+14415.309454954 ping bad connection
2018-12-19 21:16:34.713463 +0200 EET m=+14415.309490282 d.clone bad connection
2018-12-19 21:16:34.713475 +0200 EET m=+14415.309502756 Transaction error bad connection
2018-12-19 21:16:35.726006 +0200 EET m=+14416.321985206 Transaction error bad connection
2018-12-19 21:16:36.729113 +0200 EET m=+14417.325044431 ping bad connection
2018-12-19 21:16:36.729151 +0200 EET m=+14417.325082724 d.clone bad connection
2018-12-19 21:16:36.729161 +0200 EET m=+14417.325092725 Transaction error bad connection
2018-12-19 21:16:37.739508 +0200 EET m=+14418.335439380 ping bad connection
2018-12-19 21:16:37.739543 +0200 EET m=+14418.335474419 d.clone bad connection
2018-12-19 21:16:37.739551 +0200 EET m=+14418.335482718 Transaction error bad connection
2018-12-19 21:16:38.745282 +0200 EET m=+14419.341166008 ping bad connection
2018-12-19 21:16:38.745352 +0200 EET m=+14419.341235646 d.clone bad connection
2018-12-19 21:16:38.745373 +0200 EET m=+14419.341256354 Transaction error bad connection
2018-12-19 21:16:39.756186 +0200 EET m=+14420.352069603 Transaction error bad connection
2018-12-19 21:16:40.765518 +0200 EET m=+14421.361353887 ping bad connection
2018-12-19 21:16:40.765557 +0200 EET m=+14421.361392269 d.clone bad connection
2018-12-19 21:16:40.765566 +0200 EET m=+14421.361401091 Transaction error bad connection
2018-12-19 21:16:41.775789 +0200 EET m=+14422.371576736 transaction started
2018-12-19 21:16:41.782188 +0200 EET m=+14422.377975473 Transaction NO error
2018-12-19 21:16:42.790775 +0200 EET m=+14423.386562420 Exec - no error
Done 2018-12-19 21:16:42.790805 +0200 EET m=+14423.386592259
Same behavior on my machine:
[mysql] 2019/05/06 09:46:20 packets.go:36: unexpected EOF
I'm using upper.io with go-sql-driver/mysql, Go v1.11, RHEL.
Same behavior on my machine:
[mysql] 2019/05/06 09:46:20 packets.go:36: unexpected EOFI'm using upper.io with go-sql-driver/mysql, Go v1.11, RHEL.
any solution? same error
Sorry, that was a while ago. I did a fix in mysql driver for Ping. I probably should've linked it here: https://github.com/go-sql-driver/mysql/pull/875. But I believe it did not solve our issue.
So as a workaround we reduced ConnectionLifetime so that connections are not left in the pool for longer than mysql wait_timeout (8h by default) https://github.com/golang/go/blob/d7974c31d0eb0ef377a8681f6f7306d46854eb1c/src/database/sql/sql.go#L943-L948
Or you can set MaxIdleConnections to 0 so that there is no idle connections in the pool at all https://github.com/golang/go/blob/d7974c31d0eb0ef377a8681f6f7306d46854eb1c/src/database/sql/sql.go#L886-L896