yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[YSQL] CREATE DATABASE command timeouts in geo distributed/geo partitioned cluster

Open shamanthchandra-yb opened this issue 2 years ago • 4 comments

Jira Link: DB-3072

Description

Steps:

Create a geo distributed database for geo partition purpose with RF=3 in 3 nodes distributed over AP, EU, US.

Internally, tried on Platform Version: 2.15.1.0-b160

yugabyte=> CREATE DATABASE yb_test1; ERROR: Timed out: Timed out waiting for Create Database

CC: @m-iancu

Here are the logs:

postgres log I0729 04:34:01.758509 6795 pg_client.cc:125] Using TServer host_port: 172.165.20.193:9100 I0729 04:34:01.759888 6795 pg_client.cc:132] S 86: Session id acquired I0729 04:34:01.766023 6799 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108) 2022-07-29 04:34:01.769 UTC [20207] LOG: unable to find pgstat entry for abnormally terminated PID 6795 I0729 04:34:01.786801 6811 thread_pool.cc:170] Starting thread pool { name: pggate_ybclient queue_limit: 10000 max_workers: 1024 } I0729 04:34:01.787276 6811 pggate.cc:138] InitTServerSharedObject: 0, 0, 4 I0729 04:34:01.787454 6811 pg_client.cc:125] Using TServer host_port: 172.165.20.193:9100 I0729 04:34:01.788761 6811 pg_client.cc:132] S 87: Session id acquired I0729 04:34:02.375572 6814 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108) 2022-07-29 04:34:02.378 UTC [20207] LOG: unable to find pgstat entry for abnormally terminated PID 6811

tserver warning W0729 04:34:57.835811 20385 client_master_rpc.cc:112] IsCreateNamespaceDone: Leader Master (172.165.20.193:7100) timed out, 0.000s left, re-trying... W0729 04:34:57.836045 20237 client_master_rpc.cc:67] Failed to determine new Master: Timed out (yb/rpc/outbound_call.cc:544): Call timed out before sending W0729 04:34:57.837214 20389 yb_rpc.cc:392] Call yb.tserver.PgClientService.CreateDatabase 172.165.20.193:37276 => 172.165.20.193:9100 (request call id 502) took 60002ms (client timeout 60000ms).

tserver.INFO I0729 04:36:05.785667 20191 client_master_rpc.cc:77] 0x0000000007cfb6a0 -> IsCreateNamespaceDone: Failed: Timed out (yb/client/client_master_rpc.cc:42): Request IsCreateNamespaceDone timed out after deadline expired I0729 04:36:05.785810 7637 client-internal.cc:2296] Reinitialize master addresses from file: /home/yugabyte/tserver/conf/server.conf W0729 04:36:05.785822 20191 yb_rpc.cc:392] Call yb.tserver.PgClientService.CreateDatabase 172.165.20.193:37276 => 172.165.20.193:9100 (request call id 516) took 60001ms (client timeout 60000ms). I0729 04:36:05.786243 7637 client-internal.cc:2325] New master addresses: [172.151.59.191:7100,172.158.42.147:7100,172.165.20.193:7100] I0729 04:36:51.553977 20183 reactor.cc:466] TabletServer_R013: DEBUG: Closing idle connection: Connection (0x00000000073fd578) server 172.165.20.193:43067 => 172.165.20.193:9100 - it has been idle for 65.0992s

shamanthchandra-yb avatar Jul 28 '22 12:07 shamanthchandra-yb

Noticed that the CREATE DATABASE command in such a cluster takes about 78 seconds. The flag batch_ysql_system_tables_metadata= true also did not get the operation to fall under the expected deadline of 60sec. In this cluster, the ping latencies are as follows: US West<->AP : 215ms US West<->EU : 142ms EU<->AP: 114ms

So each RTT between the masters is >100ms

Notes from @sanketkedia each pg system catalog table is doing 2 replicated writes and each one is taking 500 ms for creation Thus for 128 tables it takes 500 * 128 = 64 seconds. Copying contents of these tables from template1 = 14 batches of 1s latency = 14s In total: 78s which is higher than 60s

One optimization suggested by @sanketkedia is to batch all the Upserts performed for creating all these 128 system tables.

deeps1991 avatar Jul 29 '22 02:07 deeps1991

@shamanthchandra-yb @sanketkedia Is there a significant difference in timing if we run the CREATE DATABASE command on the node/region where the master leader is vs on a node/region where the leader is far away?

i.e. To estimate what the time ratio is between proxy-node and yb-master vs between yb-master-leader and yb_master-followers.

m-iancu avatar Aug 03 '22 18:08 m-iancu

@m-iancu Below is the observation when executed CREATE DATABASE command over node which is master leader and other which is not a master leader. From LEADER: 107652.373 ms (01:47.652) (AP south) From non-Leader: 113098.722 ms (01:53.099) (EU)

shamanthchandra-yb avatar Aug 11 '22 09:08 shamanthchandra-yb

Thanks @shamanthchandra-yb -- since those times are quite similar, It seems most of the time are due to yb-master-leader and yb_master-followers latency. The difference between calling it on LEADER and non-LEADER node (i.e. proxy being local vs remote) is comparatively small so the proxy-node and yb-master is probably not that significant at the moment.

m-iancu avatar Aug 11 '22 16:08 m-iancu

@m-iancu I observed this issue even only with AP(Asia Pacific) region with Singapore and Sydney zones.

Arjun-yb avatar Sep 07 '22 14:09 Arjun-yb

The batching the upserts should improve the create database duration even for the non cross-region case as well. @Simonliuwaterloo work with @sanketkedia on this issue.

lingamsandeep avatar Sep 09 '22 22:09 lingamsandeep

From @sanketkedia: There are two parts to this problem -

  • There was a change in the default timeout for create database operation which has led to regression recently. Tracked by this: https://github.com/yugabyte/yugabyte-db/issues/14155
  • Improving the create database time by batching - tracked by this issue.

Simonliuwaterloo avatar Sep 30 '22 17:09 Simonliuwaterloo