rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

SIGSEGV during BackgroundFlush

Open prabhjotlalli opened this issue 1 year ago • 4 comments

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

Background flush should always succeed.

Actual behavior

Randomly (from once a week to once a month, 1 out of a few hundred containers) running RocksDB will crash during a background flush.

Steps to reproduce the behavior

No steps to reproduce since I'm not sure whats causing it. I have a high severity error log I'll link below (and can give more info with process).

RocksDB version: 8.11.4

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb74483841d, pid=1, tid=124
#
# JRE version: OpenJDK Runtime Environment Temurin-19.0.2+7 (19.0.2+7) (build 19.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-19.0.2+7 (19.0.2+7, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# C  [librocksdbjni10392120128203082722.so+0x43841d]  std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_find_before_node(unsigned long, std::string const&, unsigned long) const+0x1d
#
# Core dump will be written. Default location: /var/crash/core.%e.1.%h.%t
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

---------------  S U M M A R Y ------------

Command Line: -javaagent:/usr/local/lib/dd-java-agent.jar -XX:+UseZGC -Xmx20g -Xms20g (some flags and params hidden)

Host: AMD EPYC 7763 64-Core Processor, 256 cores, 160G, Debian GNU/Linux 12 (bookworm)
Time: Thu Aug 15 14:34:02 2024 UTC elapsed time: 1256452.048050 seconds (14d 13h 0m 52s)

---------------  T H R E A D  ---------------

Current thread is native thread

Stack: [0x00007fb6ddffe000,0x00007fb6de7fd000],  sp=0x00007fb6de7f7a50,  free space=8166k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni10392120128203082722.so+0x43841d]  std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_find_before_node(unsigned long, std::string const&, unsigned long) const+0x1d
C  [librocksdbjni10392120128203082722.so+0x68b7b0]  rocksdb::BlockBasedTable::PrefetchIndexAndFilterBlocks(rocksdb::ReadOptions const&, rocksdb::FilePrefetchBuffer*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, rocksdb::BlockBasedTable*, bool, rocksdb::BlockBasedTableOptions const&, int, unsigned long, unsigned long, rocksdb::BlockCacheLookupContext*)+0x790
C  [librocksdbjni10392120128203082722.so+0x68ced8]  rocksdb::BlockBasedTable::Open(rocksdb::ReadOptions const&, rocksdb::ImmutableOptions const&, rocksdb::EnvOptions const&, rocksdb::BlockBasedTableOptions const&, rocksdb::InternalKeyComparator const&, std::unique_ptr<rocksdb::RandomAccessFileReader, std::default_delete<rocksdb::RandomAccessFileReader> >&&, unsigned long, unsigned char, std::unique_ptr<rocksdb::TableReader, std::default_delete<rocksdb::TableReader> >*, unsigned long, std::shared_ptr<rocksdb::CacheReservationManager>, std::shared_ptr<rocksdb::SliceTransform const> const&, bool, bool, int, bool, unsigned long, bool, rocksdb::TailPrefetchStats*, rocksdb::BlockCacheTracer*, unsigned long, std::string const&, unsigned long, std::array<unsigned long, 2ul>)+0x1038
C  [librocksdbjni10392120128203082722.so+0x675cda]  rocksdb::BlockBasedTableFactory::NewTableReader(rocksdb::ReadOptions const&, rocksdb::TableReaderOptions const&, std::unique_ptr<rocksdb::RandomAccessFileReader, std::default_delete<rocksdb::RandomAccessFileReader> >&&, unsigned long, std::unique_ptr<rocksdb::TableReader, std::default_delete<rocksdb::TableReader> >*, bool) const+0xda
C  [librocksdbjni10392120128203082722.so+0x4f6b5b]  rocksdb::TableCache::GetTableReader(rocksdb::ReadOptions const&, rocksdb::FileOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, bool, bool, unsigned char, rocksdb::HistogramImpl*, std::unique_ptr<rocksdb::TableReader, std::default_delete<rocksdb::TableReader> >*, std::shared_ptr<rocksdb::SliceTransform const> const&, bool, int, bool, unsigned long, rocksdb::Temperature)+0xa5b
C  [librocksdbjni10392120128203082722.so+0x4f7f75]  rocksdb::TableCache::FindTable(rocksdb::ReadOptions const&, rocksdb::FileOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::BasicTypedCacheInterface<rocksdb::TableReader, (rocksdb::CacheEntryRole)13, rocksdb::Cache*>::TypedHandle**, unsigned char, std::shared_ptr<rocksdb::SliceTransform const> const&, bool, bool, rocksdb::HistogramImpl*, bool, int, bool, unsigned long, rocksdb::Temperature)+0x525
C  [librocksdbjni10392120128203082722.so+0x4fa5c3]  rocksdb::TableCache::NewIterator(rocksdb::ReadOptions const&, rocksdb::FileOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::RangeDelAggregator*, std::shared_ptr<rocksdb::SliceTransform const> const&, rocksdb::TableReader**, rocksdb::HistogramImpl*, rocksdb::TableReaderCaller, rocksdb::Arena*, bool, int, unsigned long, rocksdb::InternalKey const*, rocksdb::InternalKey const*, bool, unsigned char, rocksdb::TruncatedRangeDelIterator**)+0x5d3
C  [librocksdbjni10392120128203082722.so+0x34b58b]  rocksdb::BuildTable(std::string const&, rocksdb::VersionSet*, rocksdb::ImmutableDBOptions const&, rocksdb::TableBuilderOptions const&, rocksdb::FileOptions const&, rocksdb::ReadOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, std::vector<rocksdb::BlobFileAddition, std::allocator<rocksdb::BlobFileAddition> >*, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, unsigned long, rocksdb::SnapshotChecker*, bool, rocksdb::InternalStats*, rocksdb::IOStatus*, std::shared_ptr<rocksdb::IOTracer> const&, rocksdb::BlobFileCreationReason, rocksdb::SeqnoToTimeMapping const&, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, rocksdb::Env::WriteLifeTimeHint, std::string const*, rocksdb::BlobFileCompletionCallback*, rocksdb::Version*, unsigned long*, unsigned long*, unsigned long*)+0x344b
C  [librocksdbjni10392120128203082722.so+0x4973af]  rocksdb::FlushJob::WriteLevel0Table()+0xf0f
C  [librocksdbjni10392120128203082722.so+0x499142]  rocksdb::FlushJob::Run(rocksdb::LogsWithPrepTracker*, rocksdb::FileMetaData*, bool*)+0x732
C  [librocksdbjni10392120128203082722.so+0x41723e]  rocksdb::DBImpl::AtomicFlushMemTablesToOutputFiles(rocksdb::autovector<rocksdb::DBImpl::BGFlushArg, 8ul> const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::Env::Priority)+0xe8e
C  [librocksdbjni10392120128203082722.so+0x4198ad]  rocksdb::DBImpl::FlushMemTablesToOutputFiles(rocksdb::autovector<rocksdb::DBImpl::BGFlushArg, 8ul> const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::Env::Priority)+0x17d
C  [librocksdbjni10392120128203082722.so+0x41a74c]  rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::FlushReason*, rocksdb::Env::Priority)+0xe6c
C  [librocksdbjni10392120128203082722.so+0x41dc08]  rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority)+0xc8
C  [librocksdbjni10392120128203082722.so+0x775bcb]  rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x24b
C  [librocksdbjni10392120128203082722.so+0x775da2]  rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x62

prabhjotlalli avatar Sep 12 '24 17:09 prabhjotlalli

Thanks for the report @prabhjotlalli - I have had a quick look and I think this is a problem that might need the attention of the core team FAO @jaykorean @pdillinger. It's initiated from Java but the problem is deep in the C++ code. It is interesting that you are running a large multithreaded system, and the problem is "random". Is it correct that the problem occurs on different containers, i.e. not the same 1 out of 100s that you run on ? The SIGSEGV itself suggests that a HashMap implementing the STL unordered_map is corrupted, possibly it is trying to follow a field (a next pointer ?) from a null pointer. So I suspect that the STL concurrency rules (1 writer, or n readers, not both) are not being enforced somewhere in the flush code. I don't know my way round the core well enough to figure out where it is happening.

alanpaxton avatar Sep 16 '24 13:09 alanpaxton

@alanpaxton - Correct, it happens on different containers on multiple different clusters. Sounds good let me know if there's more info I can provide or help debug this issue.

prabhjotlalli avatar Sep 16 '24 14:09 prabhjotlalli

The PrefetchIndexAndFilterBlocks function is accessing a static const unordered_map. That suggests either memory corruption or that this occurs while some other thread is running static destructors, perhaps in the process of cleaning up from some other error. Ideally we would wrap more such things in STATIC_AVOID_DESTRUCTION to minimize false attribution of the genesis of crashes and better expose the root cause.

pdillinger avatar Sep 18 '24 17:09 pdillinger

Do you mean the kBuiltinNameAndAliases ? Is that as simple as just wrapping it with the macro ? I had inferred from the comments in the code that by now that field could be removed entirely, but maybe we're not living in the future yet..

alanpaxton avatar Oct 02 '24 15:10 alanpaxton

Any updates or anything I can do to help debug this?

prabhjotlalli avatar Oct 24 '24 20:10 prabhjotlalli

Hello, we have the same issue happening in kstreams application but for us it happens randomly on one pod and continues (1-2 crash per hour until the pod is killed)

# C [librocksdbjni4033611900321005091.so+0x4150cd] std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_find_before_node(unsigned long, std::string const&, unsigned long) const+0x2d

pviceic avatar Dec 12 '24 11:12 pviceic