yugabyte-db
yugabyte-db copied to clipboard
[flaky test] org.yb.cql.TestPartialIndex
Jira Link: DB-3390
Description
Looks like a recent regression increased the failure rate significantly. The commit range seems to be: https://github.com/yugabyte/yugabyte-db/compare/5c3f73189ab142269d876ff3e78bbf7d5ad5bbd3~...4a38579b23b473dbf7b8384317382505f14e6a46
Only YCQL commit there is: https://github.com/yugabyte/yugabyte-db/commit/5c3f73189ab142269d876ff3e78bbf7d5ad5bbd3
This includes similar-looking failures in multiple subtests:
org.yb.cql.TestPartialIndex.testUniquePartialIndexPart2_4
org.yb.cql.TestPartialIndex.testUniquePartialIndexPart2_3
org.yb.cql.TestPartialIndex.testPartialIndexPart2_4
...
The testUniquePartialIndexPart*
tests fails about 5-10% of the time on my local machine. It fails 80%+ of the time with the same signature when I test with my fix for #13308. The increase in failure rate is related to the initial leader selected for a tablet group at table creation time. Currently, they are selected at random, but my patch picks them in a way that balances them among the tservers. I don't have an RCA right now.
These tests fail because of a deadlock in the CatalogManager::DeleteTable()
path that is invoked at the end of this test. Here is the full trace of the hung write lock:
m1|pid12851|:19216|http://127.0.0.222:17644 F0919 10:12:53.539841 1888350208 rwc_lock.cc:153] Too long write lock wait, last writer TID: 44524912, last writer stack: @ 0x1181f6aab yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cb yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10394449f yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944463 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391585f yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c598b yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c33 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2937 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1273 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1133 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff3 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ab yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690b _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76683 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637b yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b7 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644 last writer current stack: @ 0x1935d74a3 _sigtramp
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1935bfedf pthread_kill
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x11820a283 yb::ThreadStacks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x118209e1b yb::ThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x11820afb3 yb::DumpThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1181f677b yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cb yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10394449f yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944463 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391585f yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231af yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c97 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cb yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e997 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff77 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b153 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644 current thread stack: @ 0x1039444cc yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444a0 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944464 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103915860 yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231b0 yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c98 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cc yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e998 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff78 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b154 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d659c yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d792c yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46dbc yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46940 yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c6a48 yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c34 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2938 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1274 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1134 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff4 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ac yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690c _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76684 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637c yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b8 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76250 std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76210 std::__1::__function::__alloc_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a74da8 std::__1::__function::__func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1033af058 std::__1::__function::__value_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1033a84c8 std::__1::function<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a50a50 yb::master::MasterDdlIf::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x113807040 yb::rpc::ServicePoolImpl::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644 Fatal failure details written to /Users/jmaley/logs/repeat_unit_test/yb-cql/org.yb.cql.TestPartialIndex#testUniquePartialIndexPart1_4/2022-09-19T10_09_03/4.fatal_failure_details.m-1.127.0.0.222-port19216.2022-09-19T10_12_53.pid12851.txt
m1|pid12851|:19216|http://127.0.0.222:17644 F20220919 10:12:53 ../../src/yb/util/rwc_lock.cc:153] Too long write lock wait, last writer TID: 44524912, last writer stack: @ 0x1181f6aab yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cb yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10394449f yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944463 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391585f yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c598b yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c33 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2937 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1273 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1133 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff3 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ab yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690b _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76683 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637b yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b7 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644 last writer current stack: @ 0x1935d74a3 _sigtramp
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1935bfedf pthread_kill
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x11820a283 yb::ThreadStacks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x118209e1b yb::ThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x11820afb3 yb::DumpThreadStack()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1181f677b yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cb yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10394449f yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944463 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391585f yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231af yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c97 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cb yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e997 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff77 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b153 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644 current thread stack: @ 0x1039444cc yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444a0 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944464 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103915860 yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231b0 yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c98 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cc yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e998 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff78 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b154 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d659c yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d792c yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46dbc yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46940 yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c6a48 yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c34 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2938 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1274 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1134 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff4 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ac yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690c _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76684 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637c yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b8 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76250 std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76210 std::__1::__function::__alloc_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a74da8 std::__1::__function::__func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1033af058 std::__1::__function::__value_func<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1033a84c8 std::__1::function<>::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a50a50 yb::master::MasterDdlIf::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x113807040 yb::rpc::ServicePoolImpl::Handle()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10384a2e8 google::LogDestination::LogToSinks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103849448 google::LogMessage::SendToLog()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103849dec google::LogMessage::Flush()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10384dde0 google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10384ad28 google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1181f69b4 yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cc yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444a0 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944464 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103915860 yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231b0 yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c98 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cc yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e998 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff78 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b154 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d659c yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d792c yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46dbc yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46940 yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c6a48 yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c34 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2938 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1274 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1134 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff4 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ac yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690c _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76684 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637c yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b8 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76250 std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644
m1|pid12851|:19216|http://127.0.0.222:17644 *** Check failure stack trace: ***
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103849604 google::LogMessage::SendToLog()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103849dec google::LogMessage::Flush()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10384dde0 google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10384ad28 google::LogMessageFatal::~LogMessageFatal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1181f69b4 yb::RWCLock::WriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444cc yb::CowObject<>::StartMutation()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039444a0 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103944464 yb::CowWriteLock<>::CowWriteLock()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103915860 yb::master::MetadataCowWrapper<>::LockForWrite()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039231b0 yb::master::BackfillTable::MarkIndexesAsDesired()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103921c98 yb::master::BackfillTable::MarkIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039238cc yb::master::BackfillTable::MarkAllIndexesAsFailed()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391e998 yb::master::BackfillTable::Abort()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10391ff78 yb::master::BackfillTable::UpdateSafeTime()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x10392b154 yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d659c yb::master::RetryingTSRpcTask::UnregisterAsyncTask()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1038d792c yb::master::RetryingTSRpcTask::AbortAndReturnPrevState()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46dbc yb::master::TableInfo::AbortTasksAndCloseIfRequested()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103b46940 yb::master::TableInfo::AbortTasks()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c6a48 yb::master::CatalogManager::DeleteTableInMemory()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2c34 yb::master::CatalogManager::DeleteTableInternal()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x1039c2938 yb::master::CatalogManager::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1274 _ZZN2yb6master17MasterServiceBase8HandleInINS0_14CatalogManagerENS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEEEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES8_SA_SD_EPKciSJ_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEENKUlvE_clEv
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca1134 _ZN2yb6master17MasterServiceBase14HandleOnLeaderINS0_20DeleteTableRequestPBENS0_21DeleteTableResponsePBEZNS1_8HandleInINS0_14CatalogManagerES3_S4_EEvPKT0_PT1_PNS_3rpc10RpcContextEMT_FNS_6StatusES9_SB_SE_EPKciSK_NS_17StronglyTypedBoolINS0_19HoldCatalogLock_TagEEEEUlvE_EEvPKSF_PS7_SE_SA_SK_iSK_SN_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103ca0ff4 yb::master::MasterServiceBase::HandleIn<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x103c9b5ac yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7690c _ZZZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_20DeleteTableRequestPBEPNS0_21DeleteTableResponsePBENSA_10RpcContextEE_clESF_SH_SI_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76684 _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20DeleteTableRequestPBENS3_21DeleteTableResponsePBEEEZZNS3_11MasterDdlIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK4$_11clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a7637c yb::master::MasterDdlIf::InitMethods()::$_11::operator()()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a762b8 _ZNSt3__1L8__invokeIRZN2yb6master11MasterDdlIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE4$_11JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclscT_fp_spscT0_fp0_EEOSF_DpOSG_
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76250 std::__1::__invoke_void_return_wrapper<>::__call<>()
m1|pid12851|:19216|http://127.0.0.222:17644 @ 0x114a76210 std::__1::__function::__alloc_func<>::operator()()
ts1|pid12895|:26019|http://127.0.1.235:17430 W0919 10:12:53.568887 1901146112 client_master_rpc.cc:128] DeleteTable: Encountered a network error from the Master(127.0.0.222:19216): Network error (yb/util/net/socket.cc:534): recvmsg got EOF from remote (system error 58), retrying...
RCA: The deadlock occurs when deleting a table that has an outstanding GetSafeTimeForTablet
RPC on pending_tasks_
.
Here are the important parts of the path:
- The table delete requests enters
CatalogManager::DeleteTableInMemory()
. This takes a write lock on the primary (aka "indexed") table. - This invokes
AbortTasks
on the primary table. When 'GetSafeTimeForTablet' is on the pending tasks, it is aborted and entersGetSafeTimeForTablet::UnregisterAsyncTaskCallback
. From here, it will path to take a write lock on the primary table, which it is already holding:
yb::master::MetadataCowWrapper<>::LockForWrite()
yb::master::BackfillTable::MarkIndexesAsDesired()
yb::master::BackfillTable::MarkIndexesAsFailed()
yb::master::BackfillTable::MarkAllIndexesAsFailed()
yb::master::BackfillTable::Abort()
yb::master::BackfillTable::UpdateSafeTime()
yb::master::GetSafeTimeForTablet::UnregisterAsyncTaskCallback()
I don't immediately know what the fix is. I'm going to trace the git history and talk to the authors to see what they think.
^^^ cc @amitanandaiyer
Change 1a030ba fixed the above issue. I'm unassigning myself from this issue. I still see some other failure signatures with this test that may be worth investigating.
Log:
m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.541280 541191 catalog_manager.cc:5472] Servicing DeleteTable request from 127.56.226.116:34509: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.649655 541191 catalog_manager.cc:5662] Successfully initiated deletion of table with table_name: "test"
m1|pid540522|:22101|http://127.189.147.195:25657 W1005 22:50:06.650669 544152 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 3121e4c94afc4cb5ae3fde2ea1e77b26: NOT deleted
ts1|pid541637|:22086|http://127.56.226.116:23033 I1005 22:50:06.698710 542536 client-internal.cc:657] Deleted table cql_test_keyspace.test
22:50:06.710 (Time-limited test) [INFO - org.yb.cql.BaseCQLTest.dropKeyspaces(BaseCQLTest.java:458)] Dropping keyspace cql_test_keyspace
22:50:06.710 (Time-limited test) [INFO - org.yb.cql.BaseCQLTest.execute(BaseCQLTest.java:713)] EXEC CQL: DROP KEYSPACE "cql_test_keyspace";
ts1|pid541637|:22086|http://127.56.226.116:23033 E1005 22:50:06.711386 542532 audit_logger.cc:539] AUDIT: user:anonymous|host:127.56.226.116:9042|source:127.0.0.1|port:44064|timestamp:1664999406711|type:DROP_KEYSPACE|category:DDL|ks:cql_test_keyspace|operation:DROP KEYSPACE "cql_test_keyspace";
m1|pid540522|:22101|http://127.189.147.195:25657 I1005 22:50:06.711961 544164 catalog_manager.cc:8114] Servicing DeleteNamespace request from 127.56.226.116:59457: namespace { name: "cql_test_keyspace" }
ts1|pid541637|:22086|http://127.56.226.116:23033 I1005 22:50:06.713174 544134 client_master_rpc.cc:75] 0x00000000040136a0 -> DeleteNamespace: Failed, got resp error: Invalid argument (yb/master/catalog_manager.cc:8164): Cannot delete keyspace which has table: test [id=3121e4c94afc4cb5ae3fde2ea1e77b26], request: namespace { name: "cql_test_keyspace" }: NAMESPACE_IS_NOT_EMPTY (master error 14)
ts1|pid541637|:22086|http://127.56.226.116:23033 W1005 22:50:06.713304 542532 process_context.cc:185] SQL Error: Server Error. Cannot delete keyspace which has table: test [id=3121e4c94afc4cb5ae3fde2ea1e77b26], request: namespace { name: "cql_test_keyspace" }: NAMESPACE_IS_NOT_EMPTY (master error 14)
ts1|pid541637|:22086|http://127.56.226.116:23033 DROP KEYSPACE "cql_test_keyspace";
ts1|pid541637|:22086|http://127.56.226.116:23033 ^^^^^^^^^^^^^^^^^^^
Reproducer: ybd --java-test org.yb.cql.TestPartialIndex#testMoreThanOneIndex -n 20
(Failed: 1/20)
This looks like an error in the yb::client
:
m1|W1005 22:50:06.650669 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 3121e4c94afc4cb5ae3fde2ea1e77b26: NOT deleted
ts1|1005 22:50:06.698710 client-internal.cc:657] Deleted table cql_test_keyspace.test
m3|I1006 00:06:19.553485 604117 catalog_manager.cc:5472] Servicing DeleteTable request from 127.98.124.24:39399: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
ts1|W1006 00:06:19.771134 604759 client-internal.cc:629] !! Got response deleted_table_ids: "0955955562cb4daaa84abb6fb9730136" deleted_table_ids: "4275a3ccb7894f07a5d1a578acd0c962" deleted_table_ids: "ea2798ca43444918a41be17d8952bd2b"
m3|W1006 00:06:19.772028 603552 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 0955955562cb4daaa84abb6fb9730136: NOT deleted
ts1|I1006 00:06:19.774981 604269 client_master_rpc.cc:75] 0x0000000002f7a4a0 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5931): The object was NOT deleted: Current schema version=8 (master error 3)
ts1|W1006 00:06:19.775149 604759 client-internal.cc:635] !! Waited for table to be deleted 0955955562cb4daaa84abb6fb9730136
Status YBClient::Data::DeleteTable(YBClient* client,
if (wait) {
// Wait for the deleted tables to be gone.
if (resp.deleted_table_ids_size() > 0) {
for (const auto& table_id : resp.deleted_table_ids()) {
RETURN_NOT_OK(WaitForDeleteTableToFinish(client, table_id, deadline));
LOG(WARNING) << "!! Waited for table to be deleted " << table_id;
The waiting code WaitForDeleteTableToFinish
was interrupted due to the error:
IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5931): The object was NOT deleted: Current schema version=8 (master error 3)
The code is wrong when the Table is in ALTERING/RUNNING state (due to a related index deleting):
if (!l->started_deleting() && !l->started_hiding()) {
LOG(WARNING) << "!! Servicing IsDeleteTableDone request for table id "
<< req->table_id() << ": NOT deleted: state=" << l->pb.state();
Status s = STATUS(IllegalState, "The object was NOT deleted", l->pb.state_msg());
return SetupError(resp->mutable_error(), MasterErrorPB::OBJECT_NOT_FOUND, s);
RUNNING = 2;
ALTERING = 3;
Servicing IsDeleteTableDone request for table id 9b99ebd7178e40418ac883065c1a5470: NOT deleted: state=2
Or:
m3|W1006 01:41:16.202991 748223 catalog_manager.cc:5931] !! Servicing IsDeleteTableDone request for table id e4f49779024e474c93edd29dbd8e0d2e: NOT deleted: 3
ts3|1006 01:41:16.203764 747055 client_master_rpc.cc:75] 0x00000000035424a0 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:5933): The object was NOT deleted: Update index permission version=7 ts=2022-10-06 01:41:16 +03 (master error 3)
Confirmed: race between DeleteTable & BackfillTable:
m3|I1006 02:01:51.210574 793433 catalog_manager.cc:5474] Servicing DeleteTable request from 127.75.169.193:43303: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
m3|W1006 02:01:52.661628 795857 backfill_index.cc:343] !!! UpdateIndexPermission set state = ALTERING
Possible solutions:
- Fix
IsDeleteTableInProgress
- handle case when the table is in RUNNING /ALTERING state. - Fix
BackfillTable
- synchronously completely stop all parallel backfill-related background activities for the deleted table.
Op-1 looks much more simpler.
Note that the op-2 means waiting for another threads (sync-op). So it potentially can slow down the Master RPC DeleteTable
.
Main reproducer: ybd --java-test org.yb.cql.TestPartialIndex#testMoreThanOneIndex -n 50
Issues:
- MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling
-
OBJECT_NOT_FOUND
- hides the wrong case (see thread) - (Minor) We don't have INFO logs in
MarkIndexInfoFromTableForDeletion
andDeleteIndexInfoFromTable
(but we have the LOG inAddIndexInfoToTable
). May be useful for debugging. E.g.:
LOG(INFO) << "DeleteIndexInfoFromTable table " << indexed_table_id << " index " << index_table_id;
- (just to recheck)
DeleteTable
/DeleteTableInMemory
should not initiate state change from DELETING into RUNNING/ALTERING (via e.g.MarkIndexInfoFromTableForDeletion
orSendAlterTable
RPC callback)
ybd --java-test org.yb.cql.TestMasterLatency#testSlowCreateDropIndex
ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 59.628 s <<< FAILURE! - in org.yb.cql.TestMasterLatency
[ERROR] testSlowCreateDropIndex(org.yb.cql.TestMasterLatency) Time elapsed: 59.381 s <<< ERROR!
com.datastax.driver.core.exceptions.InvalidQueryException:
Object Not Found. Table cql_test_keyspace.test_drop not found: OBJECT_NOT_FOUND (master error 3)
DROP TABLE cql_test_keyspace.test_drop;
^^^^^^^^^
(ql error -301)
It looks like the same issue:
m1|W1011 19:05:52.078438 1680536 catalog_manager.cc:5929] Servicing IsDeleteTableDone request for table id 6351bdc11e074e919fb801b7c6f53718: NOT deleted
Possible root cause - issue [2] above: OBJECT_NOT_FOUND - hides the wrong case (see thread)
In 2.17.2.0-b39 this also occurred in org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex:
pg_regress exited with error code: 1, failed tests: [yb_create_index]
F20230120 02:27:35 ../../src/yb/util/rwc_lock.cc:150] Too long write lock wait, last writer TID: 409433, last writer stack: ../../src/yb/util/rwc_lock.cc:163: @ 0x7f918f41c1d8 yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79: @ 0x7f91929d090a yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229: @ 0x7f91929841f8 yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151: @ 0x7f91929841f8 yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921: @ 0x7f91929841f8 yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747: @ 0x7f919298525a yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725: @ 0x7f91929871fe yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:4252: @ 0x7f9192987b9b operator()
../../src/yb/util/threadpool.h:78: @ 0x7f9192987b9b Run
../../src/yb/util/threadpool.cc:612: @ 0x7f918f447d50 yb::ThreadPool::DispatchThread(bool)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74: @ 0x7f918f44aedc void std::__invoke_impl<void, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(std::__invoke_memfun_deref, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96: @ 0x7f918f44aedc std::__invoke_result<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>::type std::__invoke<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420: @ 0x7f918f44aedc void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::__call<void, , 0ul, 1ul>(tuple<>&&, std::_Index_tuple<0ul, 1ul>)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503: @ 0x7f918f44aedc void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::operator()<, void>()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f918f44aedc void std::__invoke_impl<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::__invoke_other, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f918f44aedc std::enable_if<is_invocable_r_v<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>, void>::type std::__invoke_r<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f918f44aedc std::_Function_handler<void (), std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)> >::_M_invoke(std::_Any_data const&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f918f441ad1 std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800: @ 0x7f918f441ad1 yb::Thread::SuperviseThread(void*)
@ 0x7f918d5351ce
@ 0x7f918d1a0e72
last writer current stack: @ 0x7f918d1b5b7f
@ 0x7f918d53b847
../../src/yb/util/condition_variable.cc:136: @ 0x7f918f316fc7 yb::ConditionVariable::TimedWait(yb::MonoDelta const&) const
../../src/yb/util/rwc_lock.cc:142: @ 0x7f918f41bb6e yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79: @ 0x7f9192940af8 yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229: @ 0x7f9192940af8 yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151: @ 0x7f9192940af8 yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5578: @ 0x7f9192940af8 yb::master::CatalogManager::DeleteIndexInfoFromTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
../../src/yb/master/catalog_manager.cc:5560: @ 0x7f9192982dc0 yb::master::CatalogManager::MarkIndexInfoFromTableForDeletion(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, yb::master::DeleteTableResponsePB*)
../../src/yb/master/catalog_manager.cc:5995: @ 0x7f9192984ca2 yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747: @ 0x7f919298525a yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725: @ 0x7f91929871fe yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:4252: @ 0x7f9192987b9b operator()
../../src/yb/util/threadpool.h:78: @ 0x7f9192987b9b Run
../../src/yb/util/threadpool.cc:612: @ 0x7f918f447d50 yb::ThreadPool::DispatchThread(bool)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74: @ 0x7f918f44aedc void std::__invoke_impl<void, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(std::__invoke_memfun_deref, void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96: @ 0x7f918f44aedc std::__invoke_result<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>::type std::__invoke<void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&>(void (yb::ThreadPool::*&)(bool), yb::ThreadPool*&, bool&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420: @ 0x7f918f44aedc void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::__call<void, , 0ul, 1ul>(tuple<>&&, std::_Index_tuple<0ul, 1ul>)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503: @ 0x7f918f44aedc void std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>::operator()<, void>()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f918f44aedc void std::__invoke_impl<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::__invoke_other, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f918f44aedc std::enable_if<is_invocable_r_v<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>, void>::type std::__invoke_r<void, std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&>(std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)>&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f918f44aedc std::_Function_handler<void (), std::_Bind<void (yb::ThreadPool::*(yb::ThreadPool*, bool))(bool)> >::_M_invoke(std::_Any_data const&)
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f918f441ad1 std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800: @ 0x7f918f441ad1 yb::Thread::SuperviseThread(void*)
@ 0x7f918d5351ce
@ 0x7f918d1a0e72
current thread stack: ../../src/yb/util/rwc_lock.cc:148: @ 0x7f918f41beed yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79: @ 0x7f91929d090a yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229: @ 0x7f91929841f8 yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151: @ 0x7f91929841f8 yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921: @ 0x7f91929841f8 yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5990: @ 0x7f9192984bcf yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747: @ 0x7f919298525a yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725: @ 0x7f91929871fe yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/master_service_base-internal.h:145: @ 0x7f9192a55b2c yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}::operator()() const
../../src/yb/master/master_service_base-internal.h:83: @ 0x7f9192a55b2c void yb::master::MasterServiceBase::HandleOnLeader<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}, char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_service_base-internal.h:144: @ 0x7f9192a52a6f void yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_ddl_service.cc:29: @ 0x7f9192a52a6f DeleteTable
src/yb/master/master_ddl.service.cc:813: @ 0x7f91902942da operator()
../../src/yb/rpc/local_call.h:115: @ 0x7f91902942da HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>::<lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)> >
src/yb/master/master_ddl.service.cc:811: @ 0x7f91902944b0 operator()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f91902944b0 __invoke_impl<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f91902944b0 __invoke_r<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f91902944b0 _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f9190288335 std::function<void (shared_ptr<yb::rpc::InboundCall>)>::operator()(shared_ptr<yb::rpc::InboundCall>) const
src/yb/master/master_ddl.service.cc:631: @ 0x7f9190288335 yb::master::MasterDdlIf::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/service_pool.cc:263: @ 0x7f918f64ed26 yb::rpc::ServicePoolImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/inbound_call.cc:236: @ 0x7f918f5f1f28 yb::rpc::InboundCall::InboundCallTask::Run()
../../src/yb/rpc/thread_pool.cc:104: @ 0x7f918f661d03 Execute
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74: @ 0x7f918f65e560 __invoke_impl<void, void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96: @ 0x7f918f65e560 __invoke<void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420: @ 0x7f918f65e560 __call<void, 0>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503: @ 0x7f918f65e560 operator()<>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f918f65e560 __invoke_impl<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f918f65e560 __invoke_r<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f918f65e560 _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f918f441ad1 std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800: @ 0x7f918f441ad1 yb::Thread::SuperviseThread(void*)
@ 0x7f918d5351ce
@ 0x7f918d1a0e72
@ 0xffffffffffffffff
../../src/yb/util/logging.cc:458: @ 0x7f918f3a07b1 yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long)
@ 0x7f918f164f26
@ 0x7f918f1600db
@ 0x7f918f161738
../../src/yb/util/rwc_lock.cc:150: @ 0x7f918f41c142 yb::RWCLock::WriteLock()
../../src/yb/util/cow_object.h:79: @ 0x7f91929d090a yb::CowObject<yb::master::PersistentTableInfo>::StartMutation()
../../src/yb/util/cow_object.h:229: @ 0x7f91929841f8 yb::CowWriteLock<yb::master::PersistentTableInfo>::CowWriteLock(yb::CowObject<yb::master::PersistentTableInfo>*)
../../ent/src/yb/master/../../../../src/yb/master/catalog_entity_info.h:151: @ 0x7f91929841f8 yb::master::MetadataCowWrapper<yb::master::PersistentTableInfo>::LockForWrite()
../../src/yb/master/catalog_manager.cc:5921: @ 0x7f91929841f8 yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5990: @ 0x7f9192984bcf yb::master::CatalogManager::DeleteTableInMemory(yb::master::TableIdentifierPB const&, bool, bool, std::unordered_map<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::hash<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::equal_to<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> >, std::allocator<std::pair<yb::StronglyTypedUuid<yb::SnapshotScheduleId_Tag> const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::vector<yb::master::CatalogManager::DeletingTableData, std::allocator<yb::master::CatalogManager::DeletingTableData> >*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5747: @ 0x7f919298525a yb::master::CatalogManager::DeleteTableInternal(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/catalog_manager.cc:5725: @ 0x7f91929871fe yb::master::CatalogManager::DeleteTable(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*)
../../src/yb/master/master_service_base-internal.h:145: @ 0x7f9192a55b2c yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}::operator()() const
../../src/yb/master/master_service_base-internal.h:83: @ 0x7f9192a55b2c void yb::master::MasterServiceBase::HandleOnLeader<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}, char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_service_base-internal.h:144: @ 0x7f9192a52a6f void yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)
../../src/yb/master/master_ddl_service.cc:29: @ 0x7f9192a52a6f DeleteTable
src/yb/master/master_ddl.service.cc:813: @ 0x7f91902942da operator()
../../src/yb/rpc/local_call.h:115: @ 0x7f91902942da HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>::<lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)> >
src/yb/master/master_ddl.service.cc:811: @ 0x7f91902944b0 operator()
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f91902944b0 __invoke_impl<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f91902944b0 __invoke_r<void, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>&, shared_ptr<yb::rpc::InboundCall> >
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f91902944b0 _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f9190288335 std::function<void (shared_ptr<yb::rpc::InboundCall>)>::operator()(shared_ptr<yb::rpc::InboundCall>) const
src/yb/master/master_ddl.service.cc:631: @ 0x7f9190288335 yb::master::MasterDdlIf::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/service_pool.cc:263: @ 0x7f918f64ed26 yb::rpc::ServicePoolImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
../../src/yb/rpc/inbound_call.cc:236: @ 0x7f918f5f1f28 yb::rpc::InboundCall::InboundCallTask::Run()
../../src/yb/rpc/thread_pool.cc:104: @ 0x7f918f661d03 Execute
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:74: @ 0x7f918f65e560 __invoke_impl<void, void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96: @ 0x7f918f65e560 __invoke<void (yb::rpc::(anonymous namespace)::Worker::*&)(), yb::rpc::(anonymous namespace)::Worker*&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420: @ 0x7f918f65e560 __call<void, 0>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503: @ 0x7f918f65e560 operator()<>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61: @ 0x7f918f65e560 __invoke_impl<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111: @ 0x7f918f65e560 __invoke_r<void, std::_Bind<void (yb::rpc::(anonymous namespace)::Worker::*(yb::rpc::(anonymous namespace)::Worker*))()>&>
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290: @ 0x7f918f65e560 _M_invoke
/opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:590: @ 0x7f918f441ad1 std::function<void ()>::operator()() const
../../src/yb/util/thread.cc:800: @ 0x7f918f441ad1 yb::Thread::SuperviseThread(void*)
@ 0x7f918d5351ce
@ 0x7f918d1a0e72
@ 0xffffffffffffffff
For issue [1] - confirmed the race:
m2|pid2761212|:19314|http://127.156.165.251:27226 I0120 23:24:31.507997 2761309 catalog_manager.cc:5480] Servicing DeleteTable request from 127.149.238.212:50251: table { table_name: "test" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: false
m2|pid2761212|:19314|http://127.156.165.251:27226 W0120 23:24:31.539260 2762648 backfill_index.cc:301] !! The table state is DELETING
m2|pid2761212|:19314|http://127.156.165.251:27226 W0120 23:24:31.539317 2762648 backfill_index.cc:344] !!! UpdateIndexPermission: set_state ALTERING: Update index permission version=6 ts=2023-01-20 23:24:31 +03
Reproduced with YB HEAD = 3024f4390f5428bab11f4d4d9a14fdc63b767bcf
For the issues:
- MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling
- OBJECT_NOT_FOUND - hides the wrong case (see thread)
- (Minor) We don't have INFO logs in MarkIndexInfoFromTableForDeletion and DeleteIndexInfoFromTable (but we have the LOG in AddIndexInfoToTable). May be useful for debugging. E.g.:
LOG(INFO) << "DeleteIndexInfoFromTable table " << indexed_table_id << " index " << index_table_id;
- DeleteTable/DeleteTableInMemory must initiate state change into DELETED or DELETING.
[1] & [3] - fixed by https://github.com/yugabyte/yugabyte-db/commit/276babb3bd8021b8d8608706b92876afe0c4c93b
[2] - Not fixed yet (WIP diff: https://phabricator.dev.yugabyte.com/D22198) GHI: https://github.com/yugabyte/yugabyte-db/issues/16163
[4] GHI: https://github.com/yugabyte/yugabyte-db/issues/15883
TODO: To check org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy
(see failure below)
org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy also is flaky. It was not muted, so adding it here to mute.
The release error is OBJECT_NOT_FOUND:
12:16:57.977 (main) [ERROR - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:243)] YB Java test failed: class="org.yb.cql.TestPartialIndex", method="testPartialIndexSelectionPolicy"
com.datastax.driver.core.exceptions.InvalidQueryException: Object Not Found. Table cql_test_keyspace.idx not found: OBJECT_NOT_FOUND (master error 3)
drop index idx
^^^
Debug builds error is
17:28:31.600 (main) [ERROR - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:243)] YB Java test failed: class="org.yb.cql.TestPartialIndex", method="testPartialIndexSelectionPolicy"
com.datastax.driver.core.exceptions.InvalidQueryException: Wrong Metadata Version. schema version mismatch for table 934f1058f7d34160b095d563c41340b8: expected 63, got 64 (compt with prev: 0)
select h1,h2,r1,r2,v1,v2 from test_partial_index where h1 = 1 and v1 = NULL
^^^^^^
For the test failure: ybd --java-test org.yb.cql.TestPartialIndex#testPartialIndexSelectionPolicy -n 20
- The log like this
SQL Error: Wrong Metadata Version. schema version mismatch for table ed5bb2116fc14ba6812a3f90b7b86986: expected 4, got 0 (compt with prev: 0)
is expected in several cases:
- If we have ALTER TABLE
- If we have an Index + backfilling is enabled So, it's not an issue.
- The log is bad, but it's the issue: https://github.com/yugabyte/yugabyte-db/issues/15883
m2|pid3422495|:17255|http://127.44.235.243:25540 I0217 21:46:15.020802 3422579 catalog_manager.cc:5790] Servicing DeleteTable request from 127.82.118.73:33115: table { table_name: "idx" namespace { name: "cql_test_keyspace" database_type: YQL_DATABASE_CQL } } is_index_table: true
m2|pid3422495|:17255|http://127.44.235.243:25540 I0217 21:46:15.021855 3422579 catalog_manager.cc:5631] MarkIndexInfoFromTableForDeletion table ed5bb2116fc14ba6812a3f90b7b86986 index b11cd96992e74ae7a8cd41b4fef255fe multi_stage=1
m2|pid3422495|:17255|http://127.44.235.243:25540 W0217 21:46:15.033756 3423383 catalog_manager.cc:6321] Servicing IsDeleteTableDone request for table id b11cd96992e74ae7a8cd41b4fef255fe: NOT deleted in state RUNNING
ts3|pid3423192|:14019|http://127.82.118.73:19051 I0217 21:46:15.034438 3423591 client_master_rpc.cc:76] 0x0000561ed7842f20 -> IsDeleteTableDone: Failed, got resp error: Illegal state (yb/master/catalog_manager.cc:6323): The object was NOT deleted (master error 3)
CC: @NatashaSerebryanaya
I'm closing this issue because:
-
MAIN (confirmed): DELETING -> ALTERING/RUNNING state change by BackFilling Fixed by https://github.com/yugabyte/yugabyte-db/commit/276babb3bd8021b8d8608706b92876afe0c4c93b
-
Bad Index state after
DeleteTable()
call: Separate GHI: https://github.com/yugabyte/yugabyte-db/issues/15883 -
OBJECT_NOT_FOUND - hides the wrong case (see thread for details) Separate GHI: https://github.com/yugabyte/yugabyte-db/issues/16163 (WIP diff: https://phabricator.dev.yugabyte.com/D22198)
-
The failed test
org.yb.cql.TestPartialIndex.testPartialIndexSelectionPolicy
- is a reproducer for https://github.com/yugabyte/yugabyte-db/issues/15883 or for https://github.com/yugabyte/yugabyte-db/issues/16163 Or even for both.