vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Bug Report: Panic when accessing table from denied_tables

Open wiebeytec opened this issue 1 year ago • 2 comments

Overview of the Issue

When I perform operations on tables that are in the denied_tables (because of MoveTables) on the shard tablet control, I get a panic, and the local client reports "Lost connection to MySQL server during query".

This is very confusing to programmers. Even though they're not supposed to use that table, they don't see what they're doing wrong.

Expected result: a query error is returned saying something about that the table is marked as denied. I think think this used to be the case before. Not sure when it changed.

Reproduction Steps

Put a table in the denied_tables:

./vtctldclient SetShardTabletControl --denied-tables "widgets" legacy/0 primary
./vtctldclient RefreshStateByShard legacy/0

Then when you select from it:

mysql> select * from legacy.widgets;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    198
Current database: legacy

ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    199
Current database: legacy

ERROR 2013 (HY000): Lost connection to MySQL server during query

Binary Version

vtgate version Version: 20.0.2 (Git revision 2592c5932b3036647868299b6df76f8ef28dfbc8 branch 'HEAD') built on Wed Sep 11 08:15:20 UTC 2024 by runner@fv-az1152-369 using go1.22.7 linux/amd64

But I've been seeing this behavior for a while, so probably other versions too.

Operating System and Environment details

# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"


### Log Fragments

```sh
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: E0927 11:00:26.673458 3174554 server.go:373] mysql_server caught panic:
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: runtime error: invalid memory address or nil pointer dereference
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: runtime/panic.go:261 (0x4574d7)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: runtime/signal_unix.go:881 (0x4574a5)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/buffer/buffer.go:168 (0x1638152)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/plan_execute.go:107 (0x163813b)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/executor.go:432 (0x162cf8f)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/executor.go:228 (0x162b4b1)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/vtgate.go:462 (0x1669fb7)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/vt/vtgate/plugin_mysql_server.go:259 (0x163b95e)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/mysql/conn.go:1400 (0x109261d)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/mysql/conn.go:1385 (0x109230a)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/mysql/conn.go:951 (0x108e8e4)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/mysql/server.go:552 (0x10ad2af)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: vitess.io/vitess/go/mysql/server.go:356 (0x10abeeb)
sep 27 11:00:26 vitess-unittest start_vtgate[3174554]: runtime/asm_amd64.s:1695 (0x479a20)

wiebeytec avatar Sep 27 '24 09:09 wiebeytec

cc @vitessio/query-serving

mattlord avatar Sep 27 '24 16:09 mattlord

This could be happening as buffering is default false and we are accessing it is the code without that check

harshit-gangal avatar Oct 09 '24 06:10 harshit-gangal

@wiebeytec I think this was fixed via https://github.com/vitessio/vitess/pull/16922. Can this issue be closed?

arthurschreiber avatar Oct 22 '24 17:10 arthurschreiber

I tested with 21.0.0-rc2 (Git revision 54fa8d887fb0c154dae99b1668e4748a8f40fe42 and the current behavior seems incorrect.

There are these cases:

Case 1

  • No default DB set
  • Selecting a table that is in the deny list because it's been moved to another keyspace
vexplain queries select * from legacy.sites limit 1;
-- 30 seconds pass
-- vexplain output, saying the query went to another keyspace, a sharded one.

This takes exactly 30 seconds before it returns, saying that it's not getting it from legacy, but from sites2024. Presumably the retry doesn't use the db prefix and ends up in 'global routing' mode on the retry. That's because case 2 below, is different.

Case 2

The same as case one, but then with a default DB:

use legacy;
vexplain queries select * from legacy.sites limit 1;
-- 60 seconds pass (indeed, 60 seconds)
ERROR 1105 (HY000): query vexplain queries select * from legacy.sites limit 1 failed after retries: <nil>

Case 3

  • No default DB
  • select from an unsharded table that I put in the deny list
select * from legacy.widgets limit 1;
-- 60 seconds pass
ERROR 1105 (HY000): query select * from legacy.widgets limit 1 failed after retries: <nil>

Expected behavior

Instant result saying that access to the table is denied because it's in the deny list on this shard.

wiebeytec avatar Oct 23 '24 08:10 wiebeytec