yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

YSQL DDL Timed outs increase after enabling 2DC

Open verzlo opened this issue 5 years ago • 5 comments

Jira Link: DB-1752 Hello, "coincidentally" after enabling authentication we start seeing DDL commands timing out: ERROR: Timed out: CreateNamespace timed out after deadline expired. Time elapsed: 60.000s, allowed: 60.000s: CreateNamespace RPC to 172.26.65.197:7100 timed out after 60.000s

IPs do change once in a while, in master log: while in the master logs we see: W1021 00:45:32.314263 19556 scoped_leader_shared_lock.cc:113] Long lock of catalog manager: 0.102s @ 0x7fc8172139b5 yb::master::ScopedLeaderSharedLock::Unlock() @ 0x7fc81720ce66 yb::master::CatalogManagerBgTasks::Run() @ 0x7fc80e321a19 yb::Thread::SuperviseThread() @ 0x7fc809d25694 start_thread @ 0x7fc80946241d __clone @ (nil) (unknown) and on the IP host which reports which taking more then 60 sec: W1021 00:45:55.857110 11237 yb_rpc.cc:359] Call yb.master.MasterService.CreateNamespace 172.26.65.195:59600 => 172.26.65.197:7100 (request call id 85) took 75524ms (client timeout 60000ms). W1021 00:45:55.876921 11237 scoped_leader_shared_lock.cc:113] Long lock of catalog manager: 75.524s @ 0x7f5d0cadc9b5 yb::master::ScopedLeaderSharedLock::Unlock() @ 0x7f5d0cb0d785 yb::master::MasterServiceImpl::CreateNamespace() @ 0x7f5d0742b079 yb::master::MasterServiceIf::Handle() @ 0x7f5d055d70d1 yb::rpc::ServicePoolImpl::Handle() @ 0x7f5d05583764 yb::rpc::InboundCall::InboundCallTask::Run() @ 0x7f5d055e2c58 yb::rpc::(anonymous namespace)::Worker::Execute() @ 0x7f5d03beaa19 yb::Thread::SuperviseThread() @ 0x7f5cff5ee694 start_thread @ 0x7f5cfed2b41d __clone @ (nil) (unknown) I was able to overcome this for CQLSH by adding --request-timeout=3600 parm to CLI and it at least completing Cassandra DDL, but for YSQLSH I don;t know if there is similar functionality. I checked INFO and tserver logs and I do not see anything else suspicious there and otherwise cluster look OK. I also tried to set PG timeouts - but to no effect: yugabyte=# SET statement_timeout = 0; SET yugabyte=# SET lock_timeout = 0; SET yugabyte=# CREATE DATABASE "CMS"; ERROR: Timed out: CreateNamespace timed out after deadline expired. Time elapsed: 60.000s, allowed: 60.000s: CreateNamespace RPC to 172.26.65.197:7100 timed out after 60.000s

Pls advise..

verzlo avatar Oct 21 '19 13:10 verzlo

Is there a way to set higher client timeout for YSQL to at least allow this to complete while this work is in progress (in my case based on master log it took over 70 sec for create statement to complete while timeout is 60 sec)? Also for YCQL - it was actually on creating tables - here is the error: [root@klxcasst4 ~]# $YUGABYTE_HOME/bin/cqlsh $(hostname) -u cassandra -p cassandra Connected to local cluster at klxcasst4.kiev.cqg:9042. [cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cassandra@cqlsh> CREATE ROLE cmsadmin WITH PASSWORD = 'cmsadmin' AND LOGIN = true ; cassandra@cqlsh> CREATE KEYSPACE cms WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND DURABLE_WRITES = true; cassandra@cqlsh> grant all on keyspace cms to cmsadmin; cassandra@cqlsh> cassandra@cqlsh> use cms; cassandra@cqlsh:cms> CREATE TABLE Audit( ... InstanceID text, -- CMS instance id ... EventTime timestamp, -- Operation time ) with transactions = { 'enabled' : true }; ... SeqNum int, -- Sequence number of the operation within last millisenconds ... User_name text , -- User initiated the operation ... Operation text, -- Operation name ... Message text, -- Operation message ... PRIMARY KEY(InstanceID, User_name) ... ) with transactions = { 'enabled' : true }; OperationTimedOut: errors={'172.26.65.195': 'Client request timeout. See Session.execute_async'}, last_host=172.26.65.195

ONe other comment - I have 2 pretty identical 3 node clusters, on one I did not had any issues creating DB/tables but second one keep timing out all the time and as you can see from master log - not sure if there is anything else to check.

verzlo avatar Oct 21 '19 19:10 verzlo

Update - it appears that the difference in significant increase of DDL locks was related to some CDC -2DC replication issue which I troublesdhooted/worked around here - https://github.com/yugabyte/yugabyte-db/issues/2675

as after I removed/reset replicaiton - DDL on both nodes start working much faster and I was able to create Postgres DB and create full schema on second Universe without any problems.

as a take away from this I would recommend to a) look into your code to see if that DDL locking login has any dependencies on CDC polling b) port/add increase/disable timeout for YSQLSH CLI ( statement_timeout/lock_timeout ) as an options for operational flexibility.

verzlo avatar Oct 24 '19 18:10 verzlo

This is good information @verzlo , thanks! We'll look into CDC impacting DDL timing.

ndeodhar avatar Oct 24 '19 18:10 ndeodhar

I have same problem too I exactly followed quickstart at docs on Ubuntu 19.04 and got this error:

yugabyte=# CREATE DATABASE yb_demo;
CREATE DATABASE yb_demo;
ERROR:  Timed out: CreateNamespace timed out after deadline expired. Time elapsed: 60.000s, allowed: 60.000s: CreateNamespace RPC to 127.0.0.2:7100 timed out after 60.000s

i also tried to run 3 node cluster on docker and again faced with same problem

zeinali0 avatar Oct 24 '19 19:10 zeinali0

@zeinali0 I created a 3 replica factor using the cli tool provided for Docker and getting similar errors when creating databases... I was playing with TablePlus and got a strange error, but didn't copy it but now it says the table exists already. While in PGAdmin running CREATE DATABASE foo.

All on the same machine too, so not sure if it's related to 2DC since that seems more aimed at cross datacenter use.

ERROR:  Timed out: CreateNamespace timed out after deadline expired. Time elapsed: 60.121s, allowed: 60.000s: CreateNamespace RPC (request call id 275) to 172.20.0.2:7100 timed out after 60.000s
SQL state: XX000

keverw avatar Jan 27 '20 08:01 keverw