yugabyte-db
yugabyte-db copied to clipboard
[DocDB] Xcluster node becomes unreachable due to OOM.
Jira Link: DB-11911
Description
Test: https://perf.dev.yugabyte.com/perfstudio-dashboard/output/6439902 Cluster: http://10.9.131.126/universes/d2fab6ec-3628-47b9-ac08-b12631efdc72/nodes Can see memory allocation stacks dumped in the tserver logs. On one of the node I see a total of 8550 stacks of 1MB
yb::consensus::RaftConsensus::ReadReplicatedMessagesForCDC();;yb::cdc::GetChangesForXCluster();;yb::cdc::CDCServiceImpl::GetChanges() 324914:I0621 08:13:26.410059 43230 tcmalloc_profile.cc:180] vlog1: Sampled stack: ;;tcmalloc::tcmalloc_internal::SampleifyAllocation<>();;slow_alloc<>();;malloc;;yb::HeapBufferAllocator::AllocateInternal();;yb::internal::ArenaBase<>::NewBuffer();;yb::internal::ArenaBase<>::ArenaBase();;yb::log::ReadableLogSegment::ReadEntryBatch();;yb::consensus::LogCache::ReadOps();;yb::consensus::PeerMessageQueue::ReadFromLogCache();;yb::consensus::RaftConsensus::ReadReplicatedMessagesForCDC();;yb::cdc::GetChangesForXCluster();;yb::cdc::CDCServiceImpl::GetChanges();;std::__1::__function::__func<>::operator()();;yb::cdc::CDCServiceIf::Handle();;yb::rpc::ServicePoolImpl::Handle();;yb::rpc::InboundCall::InboundCallTask::Run();;yb::rpc::(anonymous namespace)::Worker::Execute();;yb::Thread::SuperviseThread();;start_thread;;, sum: 1052672, count: 257, requested_size: 4096, allocated_size: 4096, is_censored: 0, avg_lifetime: 0, allocator_deallocator_cpu_matched: 1
This slack thread has more details https://yugabyte.slack.com/archives/CN5SD7UCT/p1717629153912729
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
- [X] I confirm this issue does not contain any sensitive information.