vitess VTGate Has Blocking Topo Query

Overview of the Issue

The original design, AFAIUI, was that for reads the topo server is always used as a backing persistent store and a local cache is used whenever the data is available. See: https://vitess.io/docs/concepts/topology-service/#local-topology

"The local topology service must be available for Vitess to discover tablets and adjust routing as tablets come and go. However, no calls to the topology service are made in the critical path of serving a query at steady state. That means queries are still served during temporary unavailability of topology."

In certain cases, however, VTGate seems to block on a read from the topo server and query serving is paused until a response is seen and only once we get that response and see that it's a non-topo specific error (e.g. a network error) do we return the locally cached value. This can cause query pauses of up to 10 seconds when there's a Consul leader election.

We seem to end up blocking here: https://github.com/vitessio/vitess/blob/release-12.0/go/vt/srvtopo/query.go#L151-L177

Is this perhaps intentional/expected and unavoidable when using the default vtgate topo cache flags:

-srv_topo_cache_refresh : how frequently to refresh the topology for cached entries (default 1s) -srv_topo_cache_ttl : how long to use cached entries for topology (default 1s)

ℹ️ NOTE: after discussion, this was intentional in the past but should no longer apply now that we have the tablet healthchecks and always use the local cache whenever possible. So the plan is to deprecate/remove the cache TTL.

Reproduction Steps

$ git clone [email protected]:eminugurkenar/vitess-consul-poc.git && cd ./vitess-consul-poc

$ cat ~/consul-test.sh
#!/bin/bash
set -euo pipefail

minikube delete
minikube start

minikube kubectl -- create ns cloud

minikube kubectl -- apply -f consul.yaml

sleep 10

minikube service consul-ui -n cloud

sleep 10

minikube kubectl -- apply -f vtctld.yaml

sleep 15

VTCTLD_POD=$(minikube kubectl -- -n cloud get pods | awk '$1 ~ "vtctld.*" {print $1; exit}')

sleep 15

minikube kubectl -- -n cloud exec -it ${VTCTLD_POD} -c vtctld -- /vt/bin/vtctlclient -server localhost:15999 AddCellInfo -server_address localhost:8500 -root vitess/us_east_1 us_east_1

sleep 10

minikube kubectl -- apply -f mysql.yaml

sleep 10

minikube kubectl -- apply -f vtgate.yaml

sleep 10

minikube kubectl -- -n cloud exec -it ${VTCTLD_POD} -c vtctld -- /vt/bin/vtctlclient -server localhost:15999 InitShardMaster -force peak-test/0 us_east_1-1126369102

VTGATE_POD=$(minikube kubectl -- -n cloud get pods | awk '$1 ~ "vtgate.*" {print $1; exit}')

minikube kubectl -- port-forward pod/${VTGATE_POD} 3306:3306 -n cloud > /dev/null

exit 0

$ ~/consul-test.sh &

Now run the insert test (mine slightly modified from the repo):

$ cat test.sh
#!/bin/bash

mysql -h127.0.0.1 -uroot -ppassw0rd \
    -e 'CREATE TABLE IF NOT EXISTS `peak-test`.`users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
  `password` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
);'

count=0
while true; do
	echo "$(date) :: $count"
  time mysql -h127.0.0.1 -uroot -ppassw0rd \
      -e 'insert into users(username,password) values("user1","21321");';

  count=$((count+1))
  echo
  sleep 1
done;

$ bash test.sh &> /tmp/out

And while that's running we'll bring down consul:

$ minikube kubectl -- -n cloud scale statefulset consul-server --replicas=0

And in the log you will see that we have at least one query that takes ~ 10s to complete:

$ grep real /tmp/out | grep -v 0m0
real	0m9.874s
real	0m9.702s
real	0m10.000s

In this test case what happens is that vtgate's request to the topo server lands and waits on a consul follower while the follower waits for a new leader to be elected because it forwards all requests to the leader. The vtgate API call only returns after Consul's default 10s session timeout and only then do we return the cached value. During this ~ 10s (vitess, consul) period queries are not being served.

This behavior was improved in 11.0 as before then we would actually not use the cache even AFTER we got the error back from Consul.

But if we could make that non-blocking that would be even better.

Binary version

12.0.0

Log Fragments

From the vtgate:

E1104 17:51:37.943673       1 watch.go:207] ResilientWatch stream failed for us_east_1.peak-test: Unexpected response code: 500
E1104 17:51:37.944607       1 watch.go:207] ResilientWatch stream failed for us_east_1: Unexpected response code: 500
I1104 17:51:37.944627       1 vschema_manager.go:102] Received vschema update
E1104 17:51:37.944632       1 vschema_manager.go:111] SrvVschema watch error: ResilientWatch stream failed for us_east_1: Unexpected response code: 500
I1104 17:51:48.947257       1 vschema_manager.go:102] Received vschema update
E1104 17:51:48.947265       1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
I1104 17:51:59.649251       1 vschema_manager.go:102] Received vschema update
...
E1104 17:52:21.674652       1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
I1104 17:52:22.676054       1 vschema_manager.go:102] Received vschema update
E1104 17:52:22.676063       1 vschema_manager.go:111] SrvVschema watch error: Unexpected response code: 500
E1104 17:52:23.090086       1 legacy_topology_watcher.go:162] cannot get tablets for cell: us_east_1: Unexpected response code: 500
I1104 17:52:23.677633       1 vschema_manager.go:102] Received vschema update
...

Nov 04 '21 17:11 mattlord

Suggestion from @sougou is that we should always use the cache -- deprecating/removing cacheValid and cache_ttl -- and refresh the topo in a separate goroutine.

Nov 08 '21 16:11 deepthi

I've confirmed that I no longer see the traffic pause(s) in the test when adding this to the vtgate config:

diff --git a/vtgate.yaml b/vtgate.yaml
index 636a634..0312aad 100644
--- a/vtgate.yaml
+++ b/vtgate.yaml
@@ -67,6 +67,8 @@ spec:
         - MASTER
         - -grpc_max_message_size
         - "37754432"
+        - -srv_topo_cache_ttl
+        - 168h
         image: vitess/lite:v12.0.0-rc1
         imagePullPolicy: IfNotPresent
         name: vtgate

I will note, however, that on subsequently scaling the consul-server StatefulSet from 0-3 pods and 3-0 pods I do see some cases of queries pausing...

Nov 08 '21 19:11 mattlord

I could not repeat the issue after many rounds of scaling the consul-server StatefulSet from 0-3,3-0 pods when using this image for the vtgate's that was built w/o the cache TTL: https://hub.docker.com/repository/docker/mattalord/vitess_local

diff --git a/vtgate.yaml b/vtgate.yaml
index 636a634..fdc4b67 100644
--- a/vtgate.yaml
+++ b/vtgate.yaml
@@ -67,7 +67,7 @@ spec:
         - MASTER
         - -grpc_max_message_size
         - "37754432"
-        image: vitess/lite:v12.0.0-rc1
+        image: mattalord/vitess_local:latest
         imagePullPolicy: IfNotPresent
         name: vtgate
         ports:

Nov 09 '21 04:11 mattlord

vitess vitess copied to clipboard

VTGate Has Blocking Topo Query

Overview of the Issue

Reproduction Steps

Binary version

Log Fragments

vitess
vitess copied to clipboard