db icon indicating copy to clipboard operation
db copied to clipboard

mysql: "packets.go:144: bad connection"

Open iliacimpoes opened this issue 7 years ago • 9 comments

Is there a reason why Ping is called when starting a new transaction?

NewDatabaseTx clones db with checkConn = true https://github.com/upper/db/blob/ca469a8217b7d6a4689fba26f395e22d910da653/mysql/database.go#L219-L220

So Ping is called before begin transaction: https://github.com/upper/db/blob/ca469a8217b7d6a4689fba26f395e22d910da653/internal/sqladapter/database.go#L330-L334

The problem is that Ping is not retried if ping returns driver.BadConnection: https://github.com/golang/go/blob/3bf9b77c0c8f123728fcfc802599232e9bd95476/src/database/sql/sql.go#L724-L741

Whereas begin transaction retries: https://github.com/golang/go/blob/3bf9b77c0c8f123728fcfc802599232e9bd95476/src/database/sql/sql.go#L1619-L1632

We have a problem if a connection is idle for a long time and we start a transaction, it fails(because of a timeout). Whereas calling Query() or Exec() in the same situation works fine.

iliacimpoes avatar Oct 22 '18 08:10 iliacimpoes

@xiam would you have a second to respond, please?

VojtechVitek avatar Oct 22 '18 14:10 VojtechVitek

Great topic @iliacimpoes, I remember that driver.BadConnection was something that was difficult for the connection pool to recover from, the Ping here prevented driver.BadConnection errors.

We have a problem if a connection is idle for a long time and we start a transaction, it fails(because of a timeout). Whereas calling Query() or Exec() in the same situation works fine.

If that's your case I think we can simulate it locally and see if this is still required or not.

xiam avatar Oct 23 '18 14:10 xiam

Would be great. I can share a stack trace starting from the moment mysql driver returns ErrBadConnection:

[mysql] 2018/10/18 05:30:25 packets.go:144: bad connection cause: write tcp xx.xx.xx.xx:xxxxx->xx.xx.xx.xx:xxxxx: write: connection timed out
Stack: goroutine 43902 [running]:
runtime/debug.Stack(0xc0000869d0, 0xc0014a4000, 0x5)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).writePacket(0xc0005712c0, 0xc0014a4000, 0x5, 0x1000, 0x8, 0xc00075e940)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/packets.go:143 +0x332
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).writeCommandPacket(0xc0005712c0, 0xd6b30e, 0xc0005712c0, 0x0)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/packets.go:420 +0x6e
.../vendor/github.com/go-sql-driver/mysql.(*mysqlConn).Ping(0xc0005712c0, 0xe61520, 0xc000082010, 0x0, 0x0)
.../go/src/.../vendor/github.com/go-sql-driver/mysql/connection_go18.go:32 +0x1ab
database/sql.(*DB).pingDC.func1()
/usr/local/go/src/database/sql/sql.go:717 +0x4a
database/sql.withLock(0xe5bf20, 0xc0014abf80, 0xc000ee2130)
/usr/local/go/src/database/sql/sql.go:3074 +0x63
database/sql.(*DB).pingDC(0xc00010e0c0, 0xe61520, 0xc000082010, 0xc0014abf80, 0xc000ee21e0, 0x0, 0x0)
/usr/local/go/src/database/sql/sql.go:716 +0x11e
database/sql.(*DB).PingContext(0xc00010e0c0, 0xe61520, 0xc000082010, 0x951c17, 0xd2b360)
/usr/local/go/src/database/sql/sql.go:743 +0x17f
database/sql.(*DB).Ping(0xc00010e0c0, 0xc0007b3e60, 0xc0001424d0)
/usr/local/go/src/database/sql/sql.go:749 +0x43
.../vendor/upper.io/db.v3/internal/sqladapter.(*database).Ping(0xc0002f8f20, 0xc000069180, 0xe6c180)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/database.go:278 +0x34
.../vendor/upper.io/db.v3/internal/sqladapter.(*database).NewClone(0xc0001d5b80, 0xe6b4a0, 0xc000069180, 0x403501, 0xe50770, 0xe50788, 0x449768, 0x8f96c750d8d9)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/database.go:331 +0x11a
.../vendor/upper.io/db.v3/mysql.(*database).clone(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x1503101, 0xc000142460, 0xc0007b3b00, 0xc000ee23f0)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:144 +0xb0
.../vendor/upper.io/db.v3/mysql.(*database).NewDatabaseTx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x0, 0x0, 0x0, 0x0)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:220 +0x58
.../vendor/upper.io/db.v3/mysql.(*database).NewTx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0x70, 0xc000142460, 0x0, 0xc0002cc700)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:86 +0x4c
.../vendor/upper.io/db.v3/internal/sqladapter.RunTx(0xe6ffa0, 0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0xc000142460, 0x0, 0x0)
.../go/src/.../vendor/upper.io/db.v3/internal/sqladapter/tx.go:101 +0x5a
.../vendor/upper.io/db.v3/mysql.(*database).Tx(0xc0000ab4c0, 0xe615a0, 0xc0007b3d10, 0xc000142460, 0xe615a0, 0xc0007b3d10)
.../go/src/.../vendor/upper.io/db.v3/mysql/database.go:215 +0x55

iliacimpoes avatar Oct 23 '18 16:10 iliacimpoes

The only way I have managed to reproduce this issue is by using this code:

func TestIssue469_BadConnection(t *testing.T) {
	sess := mustOpen()

	// Ask the MySQL server to disconnect sessions that remain inactive for more than 1 second.
	_, err := sess.Exec(`SET SESSION wait_timeout=1`)
	assert.NoError(t, err)

	defer sess.Close()

	err = sess.Tx(nil, func(sess sqlbuilder.Tx) error {
		var err error

		// Remain inactive for 2 seconds.
		time.Sleep(time.Second * 2)

                // Attempt to count elements from the "artist" collection.
		_, err = sess.Collection("artist").Find().Count()
		if err != nil {
			return err
		}

		return nil
	})

        // driver: bad connection (which is OK)
	assert.NoError(t, err)
}

See: https://github.com/upper/db/compare/issue-469_mysql_newtx_and_bad_connection#diff-fae74f0089799e579d35b5ab68f33c86R243

The case we see above is MySQL disconnecting a client in the middle of a transaction, which is something the client won't be able to recover from, since the transaction is already invalid by that point. Is your case different from the above example? Do you know how can we reproduce your case?

xiam avatar Dec 08 '18 18:12 xiam

I added more test cases here: https://github.com/upper/db/pull/472 I hope one of them solves your problem. We won't remove the Ping() from NewClone yet, maybe in the future we'll use it only in adapters that fail to reestablish a connection by themselves.

xiam avatar Dec 09 '18 12:12 xiam

@xiam Thank you for the test and for:

		if err := nd.Ping(); err != nil {
			// Retry once if ping fails.
			return d.NewClone(p, false)
		}

I was able to reproduce the error by slightly modifying your test case:

func TestIssue469_BadConnection(t *testing.T) {
	var err error

	sess := mustOpen()
	defer sess.Close()

	// Fill up connections pool to have some idle connections
	wg := sync.WaitGroup{}
	for i := 0; i < 30; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			_, err = sess.Exec(`SET SESSION wait_timeout=1`)
			assert.NoError(t, err)
		}()
	}
	wg.Wait()

	time.Sleep(time.Second * 2)

	_, err = sess.Exec("update artist set name = ?", "blah")
	if err != nil {
		fmt.Println(time.Now(), "Exec", err)
	}

	defer func() {
		fmt.Println("Done", time.Now())
	}()

	sleepTime := 120 * time.Minute
	for i := 0; i < 100; i++ {
		time.Sleep(sleepTime)

		// At this point the server should have disconnected us. Let's try to create
		// a transaction anyway.
		err = sess.Tx(nil, func(sess sqlbuilder.Tx) error {
			fmt.Println(time.Now(), "transaction started")
			_, err = sess.Collection("artist").Find().Count()
			return err
		})
		if err == nil {
			fmt.Println(time.Now(), "Transaction NO error")
			break
		}
		fmt.Println(time.Now(), "Transaction error", err)
		if err.Error() == "bad connection" {
			sleepTime = 1 * time.Second
		}
	}

	for i := 0; i < 100; i++ {
		time.Sleep(1 * time.Second)
		_, err = sess.Exec("update artist set name = ?", "blah")
		if err != nil {
			fmt.Println(time.Now(), "Exec", err)
		} else {
			fmt.Println(time.Now(), "Exec - no error")
			return
		}
	}
}
  1. We need to fill pool of idle connections
  2. When a connection is closed by wait_timeout mysql.ErrInvalidConn is returned, but after some time - about 2 hours I start getting bad connection errors. Unfortunately I have no idea what those 2 hours are and how to reduce this time.
  3. I also added some printlns for debugging

So I got these results:

$ make test-adapter-mysql
/Library/Developer/CommandLineTools/usr/bin/make -C mysql test || exit 1;
#go test -tags generated -v -race # race: limit on 8192 simultaneously alive goroutines is exceeded, dying
go test -timeout 120h -tags generated -v  -run='TestIssue469'
=== RUN   TestIssue469_BadConnection
[mysql] 2018/12/19 10:10:21 packets.go:41: unexpected EOF
2018-12-19 10:10:21.808442 +0200 EET m=+2.178825777 Exec invalid connection
[mysql] 2018/12/19 12:10:11 packets.go:41: unexpected EOF
2018-12-19 12:10:11.702285 +0200 EET m=+7202.180542781 ping invalid connection
[mysql] 2018/12/19 12:10:11 packets.go:41: unexpected EOF
2018-12-19 12:10:11.702589 +0200 EET m=+7202.180846408 BeginTx invalid connection
2018-12-19 12:10:11.702622 +0200 EET m=+7202.180879499 Transaction error invalid connection
2018-12-19 14:10:11.571663 +0200 EET m=+14402.187703314 ping bad connection
2018-12-19 14:10:11.573195 +0200 EET m=+14402.189235727 Transaction error bad connection
2018-12-19 14:10:12.575273 +0200 EET m=+14403.191313594 ping bad connection
2018-12-19 14:10:12.577907 +0200 EET m=+14403.193946834 transaction started
2018-12-19 14:10:12.583669 +0200 EET m=+14403.199708998 Transaction NO error
2018-12-19 14:10:13.587812 +0200 EET m=+14404.203801715 Exec - no error
Done 2018-12-19 14:10:13.58784 +0200 EET m=+14404.203828780

Then I changed:

		if err := nd.Ping(); err != nil {
			// Retry once if ping fails.
			return d.NewClone(p, false)
		}

back to:

		if err := nd.Ping(); err != nil {
			// Retry once if ping fails.
			return nil, err
		}

I got much worse results:

$ make test-adapter-mysql
/Library/Developer/CommandLineTools/usr/bin/make -C mysql test || exit 1;
#go test -tags generated -v -race # race: limit on 8192 simultaneously alive goroutines is exceeded, dying
go test -timeout 120h -tags generated -v  -run='TestIssue469'
=== RUN   TestIssue469_BadConnection
[mysql] 2018/12/19 17:16:22 packets.go:41: unexpected EOF
2018-12-19 17:16:22.206053 +0200 EET m=+2.202288638 Exec invalid connection
[mysql] 2018/12/19 19:16:21 packets.go:41: unexpected EOF
2018-12-19 19:16:21.405881 +0200 EET m=+7202.207819524 ping invalid connection
2018-12-19 19:16:21.405963 +0200 EET m=+7202.207901179 d.clone invalid connection
2018-12-19 19:16:21.405982 +0200 EET m=+7202.207920744 Transaction error invalid connection
2018-12-19 21:16:21.614088 +0200 EET m=+14402.210499100 ping bad connection
2018-12-19 21:16:21.614123 +0200 EET m=+14402.210533943 d.clone bad connection
2018-12-19 21:16:21.614136 +0200 EET m=+14402.210547179 Transaction error bad connection
2018-12-19 21:16:22.62191 +0200 EET m=+14403.218273233 ping bad connection
2018-12-19 21:16:22.621984 +0200 EET m=+14403.218347729 d.clone bad connection
2018-12-19 21:16:22.621994 +0200 EET m=+14403.218357019 Transaction error bad connection
2018-12-19 21:16:23.628822 +0200 EET m=+14404.225137068 ping bad connection
2018-12-19 21:16:23.628867 +0200 EET m=+14404.225182426 d.clone bad connection
2018-12-19 21:16:23.628878 +0200 EET m=+14404.225193410 Transaction error bad connection
2018-12-19 21:16:24.630636 +0200 EET m=+14405.226951666 ping bad connection
2018-12-19 21:16:24.630675 +0200 EET m=+14405.226990692 d.clone bad connection
2018-12-19 21:16:24.630686 +0200 EET m=+14405.227000991 Transaction error bad connection
2018-12-19 21:16:25.640003 +0200 EET m=+14406.236270846 ping bad connection
2018-12-19 21:16:25.640038 +0200 EET m=+14406.236305448 d.clone bad connection
2018-12-19 21:16:25.640049 +0200 EET m=+14406.236316529 Transaction error bad connection
2018-12-19 21:16:26.64742 +0200 EET m=+14407.243687404 ping bad connection
2018-12-19 21:16:26.647465 +0200 EET m=+14407.243732695 d.clone bad connection
2018-12-19 21:16:26.647475 +0200 EET m=+14407.243742499 Transaction error bad connection
2018-12-19 21:16:27.648469 +0200 EET m=+14408.244688532 ping bad connection
2018-12-19 21:16:27.648524 +0200 EET m=+14408.244743583 d.clone bad connection
2018-12-19 21:16:27.648532 +0200 EET m=+14408.244751233 Transaction error bad connection
2018-12-19 21:16:28.658848 +0200 EET m=+14409.255019566 ping bad connection
2018-12-19 21:16:28.65889 +0200 EET m=+14409.255061484 d.clone bad connection
2018-12-19 21:16:28.658899 +0200 EET m=+14409.255070778 Transaction error bad connection
2018-12-19 21:16:29.670281 +0200 EET m=+14410.266452400 Transaction error bad connection
2018-12-19 21:16:30.679979 +0200 EET m=+14411.276102088 ping bad connection
2018-12-19 21:16:30.680012 +0200 EET m=+14411.276135474 d.clone bad connection
2018-12-19 21:16:30.680024 +0200 EET m=+14411.276147535 Transaction error bad connection
2018-12-19 21:16:31.690187 +0200 EET m=+14412.286310003 ping bad connection
2018-12-19 21:16:31.690229 +0200 EET m=+14412.286352775 d.clone bad connection
2018-12-19 21:16:31.690243 +0200 EET m=+14412.286366043 Transaction error bad connection
2018-12-19 21:16:32.69659 +0200 EET m=+14413.292665402 ping bad connection
2018-12-19 21:16:32.696637 +0200 EET m=+14413.292712780 d.clone bad connection
2018-12-19 21:16:32.69665 +0200 EET m=+14413.292725414 Transaction error bad connection
2018-12-19 21:16:33.706352 +0200 EET m=+14414.302379212 ping bad connection
2018-12-19 21:16:33.706404 +0200 EET m=+14414.302431774 d.clone bad connection
2018-12-19 21:16:33.706417 +0200 EET m=+14414.302444713 Transaction error bad connection
2018-12-19 21:16:34.713427 +0200 EET m=+14415.309454954 ping bad connection
2018-12-19 21:16:34.713463 +0200 EET m=+14415.309490282 d.clone bad connection
2018-12-19 21:16:34.713475 +0200 EET m=+14415.309502756 Transaction error bad connection
2018-12-19 21:16:35.726006 +0200 EET m=+14416.321985206 Transaction error bad connection
2018-12-19 21:16:36.729113 +0200 EET m=+14417.325044431 ping bad connection
2018-12-19 21:16:36.729151 +0200 EET m=+14417.325082724 d.clone bad connection
2018-12-19 21:16:36.729161 +0200 EET m=+14417.325092725 Transaction error bad connection
2018-12-19 21:16:37.739508 +0200 EET m=+14418.335439380 ping bad connection
2018-12-19 21:16:37.739543 +0200 EET m=+14418.335474419 d.clone bad connection
2018-12-19 21:16:37.739551 +0200 EET m=+14418.335482718 Transaction error bad connection
2018-12-19 21:16:38.745282 +0200 EET m=+14419.341166008 ping bad connection
2018-12-19 21:16:38.745352 +0200 EET m=+14419.341235646 d.clone bad connection
2018-12-19 21:16:38.745373 +0200 EET m=+14419.341256354 Transaction error bad connection
2018-12-19 21:16:39.756186 +0200 EET m=+14420.352069603 Transaction error bad connection
2018-12-19 21:16:40.765518 +0200 EET m=+14421.361353887 ping bad connection
2018-12-19 21:16:40.765557 +0200 EET m=+14421.361392269 d.clone bad connection
2018-12-19 21:16:40.765566 +0200 EET m=+14421.361401091 Transaction error bad connection
2018-12-19 21:16:41.775789 +0200 EET m=+14422.371576736 transaction started
2018-12-19 21:16:41.782188 +0200 EET m=+14422.377975473 Transaction NO error
2018-12-19 21:16:42.790775 +0200 EET m=+14423.386562420 Exec - no error
Done 2018-12-19 21:16:42.790805 +0200 EET m=+14423.386592259

iliacimpoes avatar Dec 20 '18 08:12 iliacimpoes

Same behavior on my machine:

[mysql] 2019/05/06 09:46:20 packets.go:36: unexpected EOF

I'm using upper.io with go-sql-driver/mysql, Go v1.11, RHEL.

mcandre avatar May 06 '19 21:05 mcandre

Same behavior on my machine:

[mysql] 2019/05/06 09:46:20 packets.go:36: unexpected EOF

I'm using upper.io with go-sql-driver/mysql, Go v1.11, RHEL.

any solution? same error

hcardozo avatar Nov 12 '20 12:11 hcardozo

Sorry, that was a while ago. I did a fix in mysql driver for Ping. I probably should've linked it here: https://github.com/go-sql-driver/mysql/pull/875. But I believe it did not solve our issue.

So as a workaround we reduced ConnectionLifetime so that connections are not left in the pool for longer than mysql wait_timeout (8h by default) https://github.com/golang/go/blob/d7974c31d0eb0ef377a8681f6f7306d46854eb1c/src/database/sql/sql.go#L943-L948

Or you can set MaxIdleConnections to 0 so that there is no idle connections in the pool at all https://github.com/golang/go/blob/d7974c31d0eb0ef377a8681f6f7306d46854eb1c/src/database/sql/sql.go#L886-L896

iliacimpoes avatar Nov 12 '20 13:11 iliacimpoes