vitess
vitess copied to clipboard
VTGate Has Blocking Topo Query
Overview of the Issue
The original design, AFAIUI, was that for reads the topo server is always used as a backing persistent store and a local cache is used whenever the data is available. See: https://vitess.io/docs/concepts/topology-service/#local-topology
"The local topology service must be available for Vitess to discover tablets and adjust routing as tablets come and go. However, no calls to the topology service are made in the critical path of serving a query at steady state. That means queries are still served during temporary unavailability of topology."
In certain cases, however, VTGate seems to block on a read from the topo server and query serving is paused until a response is seen and only once we get that response and see that it's a non-topo specific error (e.g. a network error) do we return the locally cached value. This can cause query pauses of up to 10 seconds when there's a Consul leader election.
We seem to end up blocking here: https://github.com/vitessio/vitess/blob/release-12.0/go/vt/srvtopo/query.go#L151-L177
Is this perhaps intentional/expected and unavoidable when using the default vtgate topo cache flags:
-srv_topo_cache_refresh
: how frequently to refresh the topology for cached entries (default 1s)-srv_topo_cache_ttl
: how long to use cached entries for topology (default 1s)
ℹ️ NOTE: after discussion, this was intentional in the past but should no longer apply now that we have the tablet healthchecks and always use the local cache whenever possible. So the plan is to deprecate/remove the cache TTL.
Reproduction Steps
$ git clone [email protected]:eminugurkenar/vitess-consul-poc.git && cd ./vitess-consul-poc
$ cat ~/consul-test.sh
#!/bin/bash
set -euo pipefail
minikube delete
minikube start
minikube kubectl -- create ns cloud
minikube kubectl -- apply -f consul.yaml
sleep 10
minikube service consul-ui -n cloud
sleep 10
minikube kubectl -- apply -f vtctld.yaml
sleep 15
VTCTLD_POD=$(minikube kubectl -- -n cloud get pods | awk '$1 ~ "vtctld.*" {print $1; exit}')
sleep 15
minikube kubectl -- -n cloud exec -it ${VTCTLD_POD} -c vtctld -- /vt/bin/vtctlclient -server localhost:15999 AddCellInfo -server_address localhost:8500 -root vitess/us_east_1 us_east_1
sleep 10
minikube kubectl -- apply -f mysql.yaml
sleep 10
minikube kubectl -- apply -f vtgate.yaml
sleep 10
minikube kubectl -- -n cloud exec -it ${VTCTLD_POD} -c vtctld -- /vt/bin/vtctlclient -server localhost:15999 InitShardMaster -force peak-test/0 us_east_1-1126369102
VTGATE_POD=$(minikube kubectl -- -n cloud get pods | awk '$1 ~ "vtgate.*" {print $1; exit}')
minikube kubectl -- port-forward pod/${VTGATE_POD} 3306:3306 -n cloud > /dev/null
exit 0
$ ~/consul-test.sh &
Now run the insert test (mine slightly modified from the repo):
$ cat test.sh
#!/bin/bash
mysql -h127.0.0.1 -uroot -ppassw0rd \
-e 'CREATE TABLE IF NOT EXISTS `peak-test`.`users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`password` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
);'
count=0
while true; do
echo "$(date) :: $count"
time mysql -h127.0.0.1 -uroot -ppassw0rd \
-e 'insert into users(username,password) values("user1","21321");';
count=$((count+1))
echo
sleep 1
done;
$ bash test.sh &> /tmp/out
And while that's running we'll bring down consul:
$ minikube kubectl -- -n cloud scale statefulset consul-server --replicas=0
And in the log you will see that we have at least one query that takes ~ 10s to complete:
$ grep real /tmp/out | grep -v 0m0
real 0m9.874s
real 0m9.702s
real 0m10.000s
In this test case what happens is that vtgate's request to the topo server lands and waits on a consul follower while the follower waits for a new leader to be elected because it forwards all requests to the leader. The vtgate API call only returns after Consul's default 10s session timeout and only then do we return the cached value. During this ~ 10s (vitess, consul) period queries are not being served.
But if we could make that non-blocking that would be even better.
Binary version
12.0.0
Log Fragments
From the vtgate:
E1104 17:51:37.943673 1 watch.go:207] ResilientWatch stream failed for us_east_1.peak-test: Unexpected response code: 500
E1104 17:51:37.944607 1 watch.go:207] ResilientWatch stream failed for us_east_1: Unexpected response code: 500
I1104 17:51:37.944627 1 vschema_manager.go:102] Received vschema update
E1104 17:51:37.944632 1 vschema_manager.go:111] SrvVschema watch error: ResilientWatch stream failed for us_east_1: Unexpected response code: 500
I1104 17:51:48.947257 1 vschema_manager.go:102] Received vschema update
E1104 17:51:48.947265 1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
I1104 17:51:59.649251 1 vschema_manager.go:102] Received vschema update
...
E1104 17:52:21.674652 1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
I1104 17:52:22.676054 1 vschema_manager.go:102] Received vschema update
E1104 17:52:22.676063 1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
E1104 17:52:23.090086 1 legacy_topology_watcher.go:162] cannot get tablets for cell: us_east_1: Unexpected response code: 500
I1104 17:52:23.677633 1 vschema_manager.go:102] Received vschema update
...
Suggestion from @sougou is that we should always use the cache -- deprecating/removing cacheValid
and cache_ttl
-- and refresh the topo in a separate goroutine.
I've confirmed that I no longer see the traffic pause(s) in the test when adding this to the vtgate
config:
diff --git a/vtgate.yaml b/vtgate.yaml
index 636a634..0312aad 100644
--- a/vtgate.yaml
+++ b/vtgate.yaml
@@ -67,6 +67,8 @@ spec:
- MASTER
- -grpc_max_message_size
- "37754432"
+ - -srv_topo_cache_ttl
+ - 168h
image: vitess/lite:v12.0.0-rc1
imagePullPolicy: IfNotPresent
name: vtgate
I will note, however, that on subsequently scaling the consul-server StatefulSet from 0-3 pods and 3-0 pods I do see some cases of queries pausing...
I could not repeat the issue after many rounds of scaling the consul-server StatefulSet from 0-3,3-0 pods when using this image for the vtgate's that was built w/o the cache TTL: https://hub.docker.com/repository/docker/mattalord/vitess_local
diff --git a/vtgate.yaml b/vtgate.yaml
index 636a634..fdc4b67 100644
--- a/vtgate.yaml
+++ b/vtgate.yaml
@@ -67,7 +67,7 @@ spec:
- MASTER
- -grpc_max_message_size
- "37754432"
- image: vitess/lite:v12.0.0-rc1
+ image: mattalord/vitess_local:latest
imagePullPolicy: IfNotPresent
name: vtgate
ports: