mysql icon indicating copy to clipboard operation
mysql copied to clipboard

"commands out of sync" errors on MySQL 8.0.24

Open owbone opened this issue 2 years ago • 8 comments

Issue description

This is similar to https://github.com/go-sql-driver/mysql/issues/1392 but much harder to reproduce and causes real failures.

https://github.com/go-sql-driver/mysql/issues/1392 describes the situation where the "new" ER_CLIENT_INTERACTION_TIMEOUT error packet is received on an idle connection. However, if you're really unlucky then it's also possible for this packet to be received just as a query is being sent. This results in an ErrPktSync ("commands out of sync. You can't run this command now") error, as the sequence ID of the error packet doesn't correspond to the packet that was simultaneously written. We have seen this in at least two very busy services, and only when there are many connections hitting the wait_timeout.

What's more, this doesn't mark the connection as bad and leaves it in a "busy buffer" state, which likely causes at least one further query to fail.

I don't have any great suggestions for fixing this. Perhaps the driver could recognise that it's an error packet with a zero sequence ID, and return ErrBadConn in that case. At the very least, it should be able to expose the error rather than giving this "commands out of sync" error.

For any readers, we fixed this by just decreasing the application level connection lifetime to be much less than wait_timeout.

Example code

Use https://github.com/lukaszlach/docker-tc to apply a network delay to the MySQL container:

docker run -d \
    --name docker-tc \
    --network host \
    --cap-add NET_ADMIN \
    --restart always \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /var/docker-tc:/var/docker-tc \
    lukaszlach/docker-tc

Start the MySQL container in a network, with a 1 second network delay applied:

docker network create test-net
docker run \
    -e MYSQL_ROOT_PASSWORD=password \
    --label "com.docker-tc.enabled=1" \
    --label "com.docker-tc.delay=1000ms" \
    -p 3306:3306 \
    --net test-net \
    mysql:8.0.24

Then this will reproduce it fairly consistently:

package main

import (
	"database/sql"
	"fmt"
	"time"

	"github.com/go-sql-driver/mysql"
)

func main() {
	mysql.SetLogger(logger{})

	cfg := mysql.NewConfig()
	cfg.User = "root"
	cfg.Passwd = "password"
	cfg.Net = "tcp"
	cfg.Addr = "127.0.0.1"
	cfg.DBName = "mysql"

	db, err := sql.Open("mysql", cfg.FormatDSN())
	if err != nil {
		panic(err)
	}
	defer db.Close()

	db.SetMaxOpenConns(1)

	if _, err := db.Exec("SET GLOBAL wait_timeout = 2"); err != nil {
		panic(err)
	}

	time.Sleep(1500 * time.Millisecond)

	rows, err := db.Query("SHOW TABLES")
	if err != nil {
		fmt.Println("query failed:", err)
	} else {
		rows.Close()
	}
}

type logger struct{}

func (logger) Print(v ...interface{}) {
	fmt.Println(append([]interface{}{"mysql: "}, v...)...)
}

Error log

mysql:  busy buffer
panic: commands out of sync. You can't run this command now

Configuration

Driver version (or git SHA): 1.7.0

Go version: 1.19.5

Server version: MySQL 8.0.24

Server OS: E.g. Debian 8.1 (Jessie), Windows 10

owbone avatar Feb 10 '23 16:02 owbone

We have seen this in at least two very busy services, and only when there are many connections hitting the wait_timeout.

See this topic.

This is an issue that should be fixed. But it doesn't happen on healthy configuration.

methane avatar Apr 11 '23 08:04 methane

This can be caused by any error packet from the server, not just ER_CLIENT_INTERACTION_TIMEOUT. We've also seen it happen with ER_DISCONNECTING_REMAINING_CLIENTS, albeit very rarely and only when the server is in trouble. The latter (any possibly other errors) can't be fixed with any client-side configuration, and can still trigger https://github.com/go-sql-driver/mysql/issues/1394.

Setting SetConnMaxIdleTime() to lower than the wait_timeout does mitigate the problem though by removing the ER_CLIENT_INTERACTION_TIMEOUT errors. This issue is mainly about unobscuring the problem for anyone trying to debug the same thing in the future.

owbone avatar Apr 11 '23 09:04 owbone

It is server side bug. You should update server.

methane avatar Apr 11 '23 09:04 methane

Which part is the server side bug?

owbone avatar Apr 11 '23 09:04 owbone

Server should use sequence number 0.

methane avatar Apr 11 '23 09:04 methane

The server does use sequence number 0. The problem is when the client receives this after having incremented the sequence number.

owbone avatar Apr 11 '23 10:04 owbone

OK. It would be fixed in future. But it has low priority.

methane avatar Apr 11 '23 12:04 methane

At the very least, it should be able to expose the error rather than giving this "commands out of sync" error.

For the record, official MySQL client does not handle ER_CLIENT_INTERACTION_TIMEOUT case. It just return an error if EXTRA_DEBUG is defined. If EXTRA_DEBUG is not defined, client doesn't check the packet sequence number at all.

https://github.com/mysql/mysql-server/blob/1bfe02bdad6604d54913c62614bde57a055c8332/sql-common/net_serv.cc#L1506-L1516

methane avatar Apr 11 '23 12:04 methane