yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[flaky test] org.yb.cql.TestPartialIndex

Open m-iancu opened this issue 2 years ago • 4 comments

Jira Link: DB-3390

Description

Looks like a recent regression increased the failure rate significantly. The commit range seems to be: https://github.com/yugabyte/yugabyte-db/compare/5c3f73189ab142269d876ff3e78bbf7d5ad5bbd3~...4a38579b23b473dbf7b8384317382505f14e6a46

Only YCQL commit there is: https://github.com/yugabyte/yugabyte-db/commit/5c3f73189ab142269d876ff3e78bbf7d5ad5bbd3

This includes similar-looking failures in multiple subtests:

org.yb.cql.TestPartialIndex.testUniquePartialIndexPart2_4	
org.yb.cql.TestPartialIndex.testUniquePartialIndexPart2_3
org.yb.cql.TestPartialIndex.testPartialIndexPart2_4
...

m-iancu avatar Sep 06 '22 17:09 m-iancu

The testUniquePartialIndexPart* tests fails about 5-10% of the time on my local machine. It fails 80%+ of the time with the same signature when I test with my fix for #13308. The increase in failure rate is related to the initial leader selected for a tablet group at table creation time. Currently, they are selected at random, but my patch picks them in a way that balances them among the tservers. I don't have an RCA right now.

These tests fail because of a deadlock in the CatalogManager::DeleteTable() path that is invoked at the end of this test. Here is the full trace of the hung write lock:

m1|pid12851|:19216|http://127.0.0.222:17644 F0919 10:12:53.539841 1888350208 rwc_lock.cc:153] Too long write lock wait, last writer TID: 44524912, last writer stack:     @        0x1181f6aab  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cb  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10394449f  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944463  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391585f  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c598b  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c33  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2937  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1273  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1133  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff3  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ab  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690b  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76683  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637b  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b7  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 last writer current stack:     @        0x1935d74a3  _sigtramp
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1935bfedf  pthread_kill
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x11820a283  yb::ThreadStacks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x118209e1b  yb::ThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x11820afb3  yb::DumpThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1181f677b  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cb  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10394449f  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944463  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391585f  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231af  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c97  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cb  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e997  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff77  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b153  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 current thread stack:     @        0x1039444cc  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444a0  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944464  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103915860  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231b0  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c98  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cc  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e998  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff78  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b154  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d659c  yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d792c  yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46dbc  yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46940  yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c6a48  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c34  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2938  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1274  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1134  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff4  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ac  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690c  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76684  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637c  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b8  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76250  std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76210  std::__1::__function::__alloc_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a74da8  std::__1::__function::__func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1033af058  std::__1::__function::__value_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1033a84c8  std::__1::function<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a50a50  yb::master::MasterDdlIf::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x113807040  yb::rpc::ServicePoolImpl::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644 Fatal failure details written to /Users/jmaley/logs/repeat_unit_test/yb-cql/org.yb.cql.TestPartialIndex#testUniquePartialIndexPart1_4/2022-09-19T10_09_03/4.fatal_failure_details.m-1.127.0.0.222-port19216.2022-09-19T10_12_53.pid12851.txt
m1|pid12851|:19216|http://127.0.0.222:17644 F20220919 10:12:53 ../../src/yb/util/rwc_lock.cc:153] Too long write lock wait, last writer TID: 44524912, last writer stack:     @        0x1181f6aab  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cb  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10394449f  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944463  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391585f  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c598b  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c33  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2937  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1273  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1133  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff3  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ab  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690b  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76683  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637b  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b7  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 last writer current stack:     @        0x1935d74a3  _sigtramp
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1935bfedf  pthread_kill
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x11820a283  yb::ThreadStacks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x118209e1b  yb::ThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x11820afb3  yb::DumpThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1181f677b  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cb  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10394449f  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944463  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391585f  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231af  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c97  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cb  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e997  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff77  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b153  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 current thread stack:     @        0x1039444cc  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444a0  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944464  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103915860  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231b0  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c98  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cc  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e998  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff78  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b154  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d659c  yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d792c  yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46dbc  yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46940  yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c6a48  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c34  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2938  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1274  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1134  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff4  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ac  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690c  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76684  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637c  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b8  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76250  std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76210  std::__1::__function::__alloc_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a74da8  std::__1::__function::__func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1033af058  std::__1::__function::__value_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1033a84c8  std::__1::function<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a50a50  yb::master::MasterDdlIf::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x113807040  yb::rpc::ServicePoolImpl::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10384a2e8  google::LogDestination::LogToSinks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103849448  google::LogMessage::SendToLog()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103849dec  google::LogMessage::Flush()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10384dde0  google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10384ad28  google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1181f69b4  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cc  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444a0  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944464  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103915860  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231b0  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c98  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cc  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e998  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff78  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b154  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d659c  yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d792c  yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46dbc  yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46940  yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c6a48  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c34  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2938  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1274  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1134  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff4  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ac  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690c  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76684  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637c  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b8  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76250  std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644 
m1|pid12851|:19216|http://127.0.0.222:17644 *** Check failure stack trace: ***
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103849604  google::LogMessage::SendToLog()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103849dec  google::LogMessage::Flush()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10384dde0  google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10384ad28  google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1181f69b4  yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444cc  yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039444a0  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103944464  yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103915860  yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039231b0  yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103921c98  yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039238cc  yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391e998  yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10391ff78  yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x10392b154  yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d659c  yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1038d792c  yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46dbc  yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103b46940  yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c6a48  yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2c34  yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x1039c2938  yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1274  _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca1134  _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103ca0ff4  yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x103c9b5ac  yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7690c  _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76684  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a7637c  yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a762b8  _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76250  std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644     @        0x114a76210  std::__1::__function::__alloc_func<>::operator()()
ts1|pid12895|:26019|http://127.0.1.235:17430 W0919 10:12:53.568887 1901146112 client_master_rpc.cc:128] DeleteTable: Encountered a network error from the Master(127.0.0.222:19216): Network error (yb/util/net/socket.cc:534): recvmsg got EOF from remote (system error 58), retrying...

joe-maley avatar Sep 19 '22 14:09 joe-maley

RCA: The deadlock occurs when deleting a table that has an outstanding GetSafeTimeForTablet RPC on pending_tasks_.

Here are the important parts of the path:

  1. The table delete requests enters CatalogManager::DeleteTableInMemory(). This takes a write lock on the primary (aka "indexed") table.
  2. This invokes AbortTasks on the primary table. When 'GetSafeTimeForTablet' is on the pending tasks, it is aborted and enters GetSafeTimeForTablet::UnregisterAsyncTaskCallback. From here, it will path to take a write lock on the primary table, which it is already holding:
yb::master::MetadataCowWrapper<>::LockForWrite()
yb::master::BackfillTable::MarkIndexesAsDesired()
yb::master::BackfillTable::MarkIndexesAsFailed()
yb::master::BackfillTable::MarkAllIndexesAsFailed()
yb::master::BackfillTable::Abort()
yb::master::BackfillTable::UpdateSafeTime()
yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()

I don't immediately know what the fix is. I'm going to trace the git history and talk to the authors to see what they think.

joe-maley avatar Sep 19 '22 15:09 joe-maley

^^^ cc @amitanandaiyer

bmatican avatar Sep 19 '22 16:09 bmatican

Change 1a030ba fixed the above issue. I'm unassigning myself from this issue. I still see some other failure signatures with this test that may be worth investigating.

joe-maley avatar Sep 23 '22 18:09 joe-maley

Log:

m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.541280 541191 catalog_manager.cc:5472] Servicing DeleteTable request from 127.56.226.116:34509: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.649655 541191 catalog_manager.cc:5662] Successfully initiated deletion of table with table_name: "test"
m1|pid540522|:22101|http://127.189.147.195:25657 W1005 22:50:06.650669 544152 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 3121e4c94afc4cb5ae3fde2ea1e77b26: NOT deleted

ts1|pid541637|:22086|http://127.56.226.116:23033 I1005 22:50:06.698710 542536 client-internal.cc:657] Deleted table cql_test_keyspace.test

22:50:06.710 (Time-limited test) [INFO - org.yb.cql.BaseCQLTest.dropKeyspaces(BaseCQLTest.java:458)] Dropping keyspace cql_test_keyspace
22:50:06.710 (Time-limited test) [INFO - org.yb.cql.BaseCQLTest.execute(BaseCQLTest.java:713)] EXEC CQL: DROP KEYSPACE "cql_test_keyspace";

ts1|pid541637|:22086|http://127.56.226.116:23033 E1005 22:50:06.711386 542532 audit_logger.cc:539] AUDIT: user:anonymous|host:127.56.226.116:9042|source:127.0.0.1|port:44064|timestamp:1664999406711|type:DROP_KEYSPACE|category:DDL|ks:cql_test_keyspace|operation:DROP KEYSPACE "cql_test_keyspace";
m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.711961 544164 catalog_manager.cc:8114] Servicing DeleteNamespace request from 127.56.226.116:59457: namespace { name: "cql_test_keyspace" }
ts1|pid541637|:22086|http://127.56.226.116:23033 I1005 22:50:06.713174 544134 client_master_rpc.cc:75] 0x00000000040136a0 -> DeleteNamespace: Failed, got resp error: Invalid argument (yb/master/catalog_manager.cc:8164): Cannot delete keyspace which has table: test [id=3121e4c94afc4cb5ae3fde2ea1e77b26], request: namespace { name: "cql_test_keyspace" }: NAMESPACE_IS_NOT_EMPTY (master error 14)
ts1|pid541637|:22086|http://127.56.226.116:23033 W1005 22:50:06.713304 542532 process_context.cc:185] SQL Error: Server Error. Cannot delete keyspace which has table: test [id=3121e4c94afc4cb5ae3fde2ea1e77b26], request: namespace { name: "cql_test_keyspace" }: NAMESPACE_IS_NOT_EMPTY (master error 14)
ts1|pid541637|:22086|http://127.56.226.116:23033 DROP KEYSPACE "cql_test_keyspace";
ts1|pid541637|:22086|http://127.56.226.116:23033               ^^^^^^^^^^^^^^^^^^^

Reproducer: ybd --java-test org.yb.cql.TestPartialIndex#testMoreThanOneIndex -n 20
(Failed: 1/20)

OlegLoginov avatar Oct 05 '22 20:10 OlegLoginov

This looks like an error in the yb::client:

m1|W1005 22:50:06.650669 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 3121e4c94afc4cb5ae3fde2ea1e77b26: NOT deleted

ts1|1005 22:50:06.698710 client-internal.cc:657] Deleted table cql_test_keyspace.test

OlegLoginov avatar Oct 05 '22 20:10 OlegLoginov

m3|I1006 00:06:19.553485 604117 catalog_manager.cc:5472] Servicing DeleteTable request from 127.98.124.24:39399: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false

ts1|W1006 00:06:19.771134 604759 client-internal.cc:629] !! Got response deleted_table_ids: "0955955562cb4daaa84abb6fb9730136" deleted_table_ids: "4275a3ccb7894f07a5d1a578acd0c962" deleted_table_ids: "ea2798ca43444918a41be17d8952bd2b"

m3|W1006 00:06:19.772028 603552 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 0955955562cb4daaa84abb6fb9730136: NOT deleted

ts1|I1006 00:06:19.774981 604269 client_master_rpc.cc:75] 0x0000000002f7a4a0 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5931): The object was NOT deleted: Current schema version=8 (master error 3)
ts1|W1006 00:06:19.775149 604759 client-internal.cc:635] !! Waited for table to be deleted 0955955562cb4daaa84abb6fb9730136
Status YBClient::Data::DeleteTable(YBClient* client,
  if (wait) {
    // Wait for the deleted tables to be gone.
    if (resp.deleted_table_ids_size() > 0) {
      for (const auto& table_id : resp.deleted_table_ids()) {
        RETURN_NOT_OK(WaitForDeleteTableToFinish(client, table_id, deadline));
        LOG(WARNING) << "!! Waited for table to be deleted " << table_id;

The waiting code WaitForDeleteTableToFinish was interrupted due to the error: IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5931): The object was NOT deleted: Current schema version=8 (master error 3)

OlegLoginov avatar Oct 05 '22 21:10 OlegLoginov

The code is wrong when the Table is in ALTERING/RUNNING state (due to a related index deleting):

  if (!l->started_deleting() && !l->started_hiding()) {
    LOG(WARNING) << "!! Servicing IsDeleteTableDone request for table id "
                 << req->table_id() << ": NOT deleted: state=" << l->pb.state();
    Status s = STATUS(IllegalState, "The object was NOT deleted", l->pb.state_msg());
    return SetupError(resp->mutable_error(), MasterErrorPB::OBJECT_NOT_FOUND, s);
    RUNNING = 2;
    ALTERING = 3;

Servicing IsDeleteTableDone request for table id 9b99ebd7178e40418ac883065c1a5470: NOT deleted: state=2

OlegLoginov avatar Oct 05 '22 21:10 OlegLoginov

Or:

m3|W1006 01:41:16.202991 748223 catalog_manager.cc:5931] !! Servicing IsDeleteTableDone request for table id e4f49779024e474c93edd29dbd8e0d2e: NOT deleted: 3

ts3|1006 01:41:16.203764 747055 client_master_rpc.cc:75] 0x00000000035424a0 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5933): The object was NOT deleted: Update index permission version=7 ts=2022-10-06 01:41:16 +03 (master error 3)

OlegLoginov avatar Oct 05 '22 22:10 OlegLoginov

Confirmed: race between DeleteTable & BackfillTable:

m3|I1006 02:01:51.210574 793433 catalog_manager.cc:5474] Servicing DeleteTable request from 127.75.169.193:43303: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
m3|W1006 02:01:52.661628 795857 backfill_index.cc:343]  !!!  UpdateIndexPermission set state = ALTERING

Possible solutions:

  1. Fix IsDeleteTableInProgress - handle case when the table is in RUNNING /ALTERING state.
  2. Fix BackfillTable - synchronously completely stop all parallel backfill-related background activities for the deleted table.

Op-1 looks much more simpler. Note that the op-2 means waiting for another threads (sync-op). So it potentially can slow down the Master RPC DeleteTable.

OlegLoginov avatar Oct 05 '22 23:10 OlegLoginov

Main reproducer: ybd --java-test org.yb.cql.TestPartialIndex#testMoreThanOneIndex -n 50

OlegLoginov avatar Oct 06 '22 00:10 OlegLoginov

Issues:

  1. MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling
  2. OBJECT_NOT_FOUND - hides the wrong case (see thread)
  3. (Minor) We don't have INFO logs in MarkIndexInfoFromTableForDeletion and DeleteIndexInfoFromTable (but we have the LOG in AddIndexInfoToTable). May be useful for debugging. E.g.:
  LOG(INFO) << "DeleteIndexInfoFromTable table " << indexed_table_id << " index " << index_table_id;
  1. (just to recheck) DeleteTable/DeleteTableInMemory should not initiate state change from DELETING into RUNNING/ALTERING (via e.g. MarkIndexInfoFromTableForDeletion or SendAlterTable RPC callback)

OlegLoginov avatar Oct 06 '22 17:10 OlegLoginov

ybd --java-test org.yb.cql.TestMasterLatency#testSlowCreateDropIndex

ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 59.628 s <<< FAILURE! - in org.yb.cql.TestMasterLatency
[ERROR] testSlowCreateDropIndex(org.yb.cql.TestMasterLatency)  Time elapsed: 59.381 s  <<< ERROR!
com.datastax.driver.core.exceptions.InvalidQueryException: 
Object Not Found. Table cql_test_keyspace.test_drop not found: OBJECT_NOT_FOUND (master error 3)
DROP TABLE cql_test_keyspace.test_drop;
                             ^^^^^^^^^
 (ql error -301)

It looks like the same issue:

m1|W1011 19:05:52.078438 1680536 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 6351bdc11e074e919fb801b7c6f53718: NOT deleted

Possible root cause - issue [2] above: OBJECT_NOT_FOUND - hides the wrong case (see thread)

OlegLoginov avatar Oct 11 '22 16:10 OlegLoginov

In 2.17.2.0-b39 this also occurred in org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex:

pg_regress exited with error code: 1, failed tests: [yb_create_index]

F20230120 02:27:35 ../../src/yb/util/rwc_lock.cc:150] Too long write lock wait, last writer TID: 409433, last writer stack: ../../src/yb/util/rwc_lock.cc:163:                                                                      @     0x7f918f41c1d8  yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79:                                                                      @     0x7f91929d090a  yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229:                                                                     @     0x7f91929841f8  yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151:                            @     0x7f91929841f8  yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921:                                                            @     0x7f91929841f8  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747:                                                            @     0x7f919298525a  yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725:                                                            @     0x7f91929871fe  yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:4252:                                                            @     0x7f9192987b9b  operator()
../../src/yb/util/threadpool.h:78:                                                                      @     0x7f9192987b9b  Run
../../src/yb/util/threadpool.cc:612:                                                                    @     0x7f918f447d50  yb::ThreadPool::DispatchThread(bool)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74:                                        @     0x7f918f44aedc  void std::__invoke_impl<void, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(std::__invoke_memfun_deref, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96:                                        @     0x7f918f44aedc  std::__invoke_result<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>::type std::__invoke<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420:                                          @     0x7f918f44aedc  void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::__call<void, , 0ul, 1ul>(tuple<>&&, std::_Index_tuple<0ul, 1ul>)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503:                                          @     0x7f918f44aedc  void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::operator()<, void>()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f918f44aedc  void std::__invoke_impl<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::__invoke_other, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f918f44aedc  std::enable_if<is_invocable_r_v<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>, void>::type std::__invoke_r<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f918f44aedc  std::_Function_handler<void (), std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)> >::_M_invoke(std::_Any_data const&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f918f441ad1  std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800:                                                                        @     0x7f918f441ad1  yb::Thread::SuperviseThread(void*)
    @     0x7f918d5351ce 
    @     0x7f918d1a0e72 


last writer current stack:     @     0x7f918d1b5b7f 
    @     0x7f918d53b847 
../../src/yb/util/condition_variable.cc:136:                                                            @     0x7f918f316fc7  yb::ConditionVariable::TimedWait(yb::MonoDelta const&) const
../../src/yb/util/rwc_lock.cc:142:                                                                      @     0x7f918f41bb6e  yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79:                                                                      @     0x7f9192940af8  yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229:                                                                     @     0x7f9192940af8  yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151:                            @     0x7f9192940af8  yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5578:                                                            @     0x7f9192940af8  yb::master::CatalogManager::DeleteIndexInfoFromTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
../../src/yb/master/catalog_manager.cc:5560:                                                            @     0x7f9192982dc0  yb::master::CatalogManager::MarkIndexInfoFromTableForDeletion(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, yb::master::DeleteTableResponsePB*)
../../src/yb/master/catalog_manager.cc:5995:                                                            @     0x7f9192984ca2  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747:                                                            @     0x7f919298525a  yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725:                                                            @     0x7f91929871fe  yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:4252:                                                            @     0x7f9192987b9b  operator()
../../src/yb/util/threadpool.h:78:                                                                      @     0x7f9192987b9b  Run
../../src/yb/util/threadpool.cc:612:                                                                    @     0x7f918f447d50  yb::ThreadPool::DispatchThread(bool)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74:                                        @     0x7f918f44aedc  void std::__invoke_impl<void, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(std::__invoke_memfun_deref, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96:                                        @     0x7f918f44aedc  std::__invoke_result<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>::type std::__invoke<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420:                                          @     0x7f918f44aedc  void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::__call<void, , 0ul, 1ul>(tuple<>&&, std::_Index_tuple<0ul, 1ul>)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503:                                          @     0x7f918f44aedc  void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::operator()<, void>()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f918f44aedc  void std::__invoke_impl<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::__invoke_other, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f918f44aedc  std::enable_if<is_invocable_r_v<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>, void>::type std::__invoke_r<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f918f44aedc  std::_Function_handler<void (), std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)> >::_M_invoke(std::_Any_data const&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f918f441ad1  std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800:                                                                        @     0x7f918f441ad1  yb::Thread::SuperviseThread(void*)
    @     0x7f918d5351ce 
    @     0x7f918d1a0e72 


current thread stack: ../../src/yb/util/rwc_lock.cc:148:                                                                      @     0x7f918f41beed  yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79:                                                                      @     0x7f91929d090a  yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229:                                                                     @     0x7f91929841f8  yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151:                            @     0x7f91929841f8  yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921:                                                            @     0x7f91929841f8  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5990:                                                            @     0x7f9192984bcf  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747:                                                            @     0x7f919298525a  yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725:                                                            @     0x7f91929871fe  yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/master_service_base-internal.h:145:                                                 @     0x7f9192a55b2c  yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}::operator()() const
../../src/yb/master/master_service_base-internal.h:83:                                                  @     0x7f9192a55b2c  void yb::master::MasterServiceBase::HandleOnLeader<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}, char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_service_base-internal.h:144:                                                 @     0x7f9192a52a6f  void yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_ddl_service.cc:29:                                                           @     0x7f9192a52a6f  DeleteTable
src/yb/master/master_ddl.service.cc:813:                                                                @     0x7f91902942da  operator()
../../src/yb/rpc/local_call.h:115:                                                                      @     0x7f91902942da  HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>::<lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)> >
src/yb/master/master_ddl.service.cc:811:                                                                @     0x7f91902944b0  operator()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f91902944b0  __invoke_impl<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f91902944b0  __invoke_r<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f91902944b0  _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f9190288335  std::function<void (shared_ptr<yb::rpc::InboundCall>)>::operator()(shared_ptr<yb::rpc::InboundCall>) const
src/yb/master/master_ddl.service.cc:631:                                                                @     0x7f9190288335  yb::master::MasterDdlIf::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/service_pool.cc:263:                                                                   @     0x7f918f64ed26  yb::rpc::ServicePoolImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/inbound_call.cc:236:                                                                   @     0x7f918f5f1f28  yb::rpc::InboundCall::InboundCallTask::Run()
../../src/yb/rpc/thread_pool.cc:104:                                                                    @     0x7f918f661d03  Execute
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74:                                        @     0x7f918f65e560  __invoke_impl<void, void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96:                                        @     0x7f918f65e560  __invoke<void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420:                                          @     0x7f918f65e560  __call<void, 0>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503:                                          @     0x7f918f65e560  operator()<>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f918f65e560  __invoke_impl<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f918f65e560  __invoke_r<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f918f65e560  _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f918f441ad1  std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800:                                                                        @     0x7f918f441ad1  yb::Thread::SuperviseThread(void*)
    @     0x7f918d5351ce 
    @     0x7f918d1a0e72 
    @ 0xffffffffffffffff 
../../src/yb/util/logging.cc:458:                                                                       @     0x7f918f3a07b1  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long)
    @     0x7f918f164f26 
    @     0x7f918f1600db 
    @     0x7f918f161738 
../../src/yb/util/rwc_lock.cc:150:                                                                      @     0x7f918f41c142  yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79:                                                                      @     0x7f91929d090a  yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229:                                                                     @     0x7f91929841f8  yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151:                            @     0x7f91929841f8  yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921:                                                            @     0x7f91929841f8  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5990:                                                            @     0x7f9192984bcf  yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747:                                                            @     0x7f919298525a  yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725:                                                            @     0x7f91929871fe  yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/master_service_base-internal.h:145:                                                 @     0x7f9192a55b2c  yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}::operator()() const
../../src/yb/master/master_service_base-internal.h:83:                                                  @     0x7f9192a55b2c  void yb::master::MasterServiceBase::HandleOnLeader<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}, char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_service_base-internal.h:144:                                                 @     0x7f9192a52a6f  void yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_ddl_service.cc:29:                                                           @     0x7f9192a52a6f  DeleteTable
src/yb/master/master_ddl.service.cc:813:                                                                @     0x7f91902942da  operator()
../../src/yb/rpc/local_call.h:115:                                                                      @     0x7f91902942da  HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>::<lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)> >
src/yb/master/master_ddl.service.cc:811:                                                                @     0x7f91902944b0  operator()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f91902944b0  __invoke_impl<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f91902944b0  __invoke_r<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f91902944b0  _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f9190288335  std::function<void (shared_ptr<yb::rpc::InboundCall>)>::operator()(shared_ptr<yb::rpc::InboundCall>) const
src/yb/master/master_ddl.service.cc:631:                                                                @     0x7f9190288335  yb::master::MasterDdlIf::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/service_pool.cc:263:                                                                   @     0x7f918f64ed26  yb::rpc::ServicePoolImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/inbound_call.cc:236:                                                                   @     0x7f918f5f1f28  yb::rpc::InboundCall::InboundCallTask::Run()
../../src/yb/rpc/thread_pool.cc:104:                                                                    @     0x7f918f661d03  Execute
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74:                                        @     0x7f918f65e560  __invoke_impl<void, void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96:                                        @     0x7f918f65e560  __invoke<void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420:                                          @     0x7f918f65e560  __call<void, 0>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503:                                          @     0x7f918f65e560  operator()<>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61:                                        @     0x7f918f65e560  __invoke_impl<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111:                                       @     0x7f918f65e560  __invoke_r<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290:                                 @     0x7f918f65e560  _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590:                                 @     0x7f918f441ad1  std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800:                                                                        @     0x7f918f441ad1  yb::Thread::SuperviseThread(void*)
    @     0x7f918d5351ce 
    @     0x7f918d1a0e72 
    @ 0xffffffffffffffff 

def- avatar Jan 20 '23 10:01 def-

For issue [1] - confirmed the race:

m2|pid2761212|:19314|http://127.156.165.251:27226 I0120 23:24:31.507997 2761309 catalog_manager.cc:5480] Servicing DeleteTable request from 127.149.238.212:50251: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false

m2|pid2761212|:19314|http://127.156.165.251:27226 W0120 23:24:31.539260 2762648 backfill_index.cc:301] !! The table state is DELETING
m2|pid2761212|:19314|http://127.156.165.251:27226 W0120 23:24:31.539317 2762648 backfill_index.cc:344] !!! UpdateIndexPermission:  set_state ALTERING: Update index permission version=6 ts=2023-01-20 23:24:31 +03

Reproduced with YB HEAD = 3024f4390f5428bab11f4d4d9a14fdc63b767bcf

OlegLoginov avatar Jan 20 '23 20:01 OlegLoginov

For the issues:

  1. MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling
  2. OBJECT_NOT_FOUND - hides the wrong case (see thread)
  3. (Minor) We don't have INFO logs in MarkIndexInfoFromTableForDeletion and DeleteIndexInfoFromTable (but we have the LOG in AddIndexInfoToTable). May be useful for debugging. E.g.:
  LOG(INFO) << "DeleteIndexInfoFromTable table " << indexed_table_id << " index " << index_table_id;
  1. DeleteTable/DeleteTableInMemory must initiate state change into DELETED or DELETING.

[1] & [3] - fixed by https://github.com/yugabyte/yugabyte-db/commit/276babb3bd8021b8d8608706b92876afe0c4c93b

[2] - Not fixed yet (WIP diff: https://phabricator.dev.yugabyte.com/D22198) GHI: https://github.com/yugabyte/yugabyte-db/issues/16163

[4] GHI: https://github.com/yugabyte/yugabyte-db/issues/15883

TODO: To check org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy (see failure below)

OlegLoginov avatar Jan 30 '23 18:01 OlegLoginov

org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy also is flaky. It was not muted, so adding it here to mute.

The release error is OBJECT_NOT_FOUND:

12:16:57.977 (main) [ERROR - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:243)] YB Java test failed: class="org.yb.cql.TestPartialIndex", method="testPartialIndexSelectionPolicy"
com.datastax.driver.core.exceptions.InvalidQueryException: Object Not Found. Table cql_test_keyspace.idx not found: OBJECT_NOT_FOUND (master error 3)
drop index idx
           ^^^

Debug builds error is

17:28:31.600 (main) [ERROR - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:243)] YB Java test failed: class="org.yb.cql.TestPartialIndex", method="testPartialIndexSelectionPolicy"
com.datastax.driver.core.exceptions.InvalidQueryException: Wrong Metadata Version. schema version mismatch for table 934f1058f7d34160b095d563c41340b8: expected 63, got 64 (compt with prev: 0)
select h1,h2,r1,r2,v1,v2 from test_partial_index where h1 = 1 and v1 = NULL
^^^^^^

NatashaSerebryanaya avatar Feb 02 '23 18:02 NatashaSerebryanaya

For the test failure: ybd --java-test org.yb.cql.TestPartialIndex#testPartialIndexSelectionPolicy -n 20

  1. The log like this SQL Error: Wrong Metadata Version. schema version mismatch for table ed5bb2116fc14ba6812a3f90b7b86986: expected 4, got 0 (compt with prev: 0) is expected in several cases:
  • If we have ALTER TABLE
  • If we have an Index + backfilling is enabled So, it's not an issue.
  1. The log is bad, but it's the issue: https://github.com/yugabyte/yugabyte-db/issues/15883
m2|pid3422495|:17255|http://127.44.235.243:25540 I0217 21:46:15.020802 3422579 catalog_manager.cc:5790] Servicing DeleteTable request from 127.82.118.73:33115: table { table_name: "idx" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: true
m2|pid3422495|:17255|http://127.44.235.243:25540 I0217 21:46:15.021855 3422579 catalog_manager.cc:5631] MarkIndexInfoFromTableForDeletion table ed5bb2116fc14ba6812a3f90b7b86986 index b11cd96992e74ae7a8cd41b4fef255fe multi_stage=1
m2|pid3422495|:17255|http://127.44.235.243:25540 W0217 21:46:15.033756 3423383 catalog_manager.cc:6321] Servicing IsDeleteTableDone request for table id b11cd96992e74ae7a8cd41b4fef255fe: NOT deleted in state RUNNING
ts3|pid3423192|:14019|http://127.82.118.73:19051 I0217 21:46:15.034438 3423591 client_master_rpc.cc:76] 0x0000561ed7842f20 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:6323): The object was NOT deleted (master error 3)

CC: @NatashaSerebryanaya

OlegLoginov avatar Feb 20 '23 19:02 OlegLoginov

I'm closing this issue because:

  1. MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling Fixed by https://github.com/yugabyte/yugabyte-db/commit/276babb3bd8021b8d8608706b92876afe0c4c93b

  2. Bad Index state after DeleteTable() call: Separate GHI: https://github.com/yugabyte/yugabyte-db/issues/15883

  3. OBJECT_NOT_FOUND - hides the wrong case (see thread for details) Separate GHI: https://github.com/yugabyte/yugabyte-db/issues/16163 (WIP diff: https://phabricator.dev.yugabyte.com/D22198)

  4. The failed test org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy - is a reproducer for https://github.com/yugabyte/yugabyte-db/issues/15883 or for https://github.com/yugabyte/yugabyte-db/issues/16163 Or even for both.

OlegLoginov avatar Feb 20 '23 19:02 OlegLoginov