rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Segfault on iterator seek

Open zh217 opened this issue 3 years ago • 1 comments

We are using v7.7.3 and encounter segfault on rocksdb::WBWIIteratorImpl::Seek intermittently. The segfault only happens under heavy load, and even with heavy load it is not possible to trigger it reliably. It is easier to trigger on Windows and MacOS Ventura, much harder on Linux and older MacOS versions. It seems having more threads than CPU count is required to trigger it, but we are not sure.

The following is the trace on MacOS when RocksDB is compiled with debug symbols:

running 65 tests
Assertion failed: (x != nullptr), function FindGreaterOrEqual, file skiplist.h, line 312.
Assertion failed: (x != nullptr)Assertion failed: (x != nullptr), function FindGreaterOrEqual, f, function FindGreaterOrEqual, file skiplist.h, line 312.
Assertion failed: (x != nullptr), function FindGreaterOrEqual, file skiplist.h, line 312.
Assertion failed: (x != nullptr), function FindGreaterOrEqual, fAssertion failed: (x != nullptr)ile skiplist.h, line 312.
, function FindGreaterOrEqual, fAssertion failed: (x != nullptr)ile skiplist.h, line 312.
, function FindGreaterOrEqual, file skiplist.h, line 312.
ile skiplist.h, line 312.
warning: air_routes-68a595ab527559b8 was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 66602 stopped
* thread #16, name = 'dfw_by_region', stop reason = hit program assert
    frame #4: 0x000000010080ab58 air_routes-68a595ab527559b8`rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator const&>::FindGreaterOrEqual(rocksdb::WriteBatchIndexEntry* const&) const (.cold.1) at skiplist.h:312:5 [opt]
   309 	  int level = GetMaxHeight() - 1;
   310 	  Node* last_bigger = nullptr;
   311 	  while (true) {
-> 312 	    assert(x != nullptr);
   313 	    Node* next = x->Next(level);
   314 	    // Make sure the lists are sorted
   315 	    assert(x == head_ || next == nullptr || KeyIsAfterNode(next->key, x));
Target 0: (air_routes-68a595ab527559b8) stopped.
(lldb) bt
* thread #16, name = 'dfw_by_region', stop reason = hit program assert
    frame #0: 0x000000018e563224 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018e599cec libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018e4d32c8 libsystem_c.dylib`abort + 180
    frame #3: 0x000000018e4d2620 libsystem_c.dylib`__assert_rtn + 272
  * frame #4: 0x000000010080ab58 air_routes-68a595ab527559b8`rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator const&>::FindGreaterOrEqual(rocksdb::WriteBatchIndexEntry* const&) const (.cold.1) at skiplist.h:312:5 [opt]
    frame #5: 0x000000010069854c air_routes-68a595ab527559b8`rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator const&>::FindGreaterOrEqual(this=<unavailable>, key=<unavailable>) const at skiplist.h:312:5 [opt]
    frame #6: 0x000000010069b00c air_routes-68a595ab527559b8`rocksdb::WBWIIteratorImpl::Seek(rocksdb::Slice const&) [inlined] rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator const&>::Iterator::Seek(this=0x0000600000c0db40, target=0x0000000171cae280) at skiplist.h:252:18 [opt]
    frame #7: 0x000000010069b000 air_routes-68a595ab527559b8`rocksdb::WBWIIteratorImpl::Seek(this=0x0000600000c0db30, key=<unavailable>) at write_batch_with_index_internal.h:240:21 [opt]
    frame #8: 0x0000000100699398 air_routes-68a595ab527559b8`rocksdb::BaseDeltaIterator::Seek(this=0x0000600003710000, k=0x0000000171cae2f0) at write_batch_with_index_internal.cc:61:20 [opt]

We've also observed segfaults several lines down, for example on line 315 in skiplist.h,

assert(x == head_ || next == nullptr || KeyIsAfterNode(next->key, x));

next->key could point to rubbish.

Currently we are kind of stuck and not sure what steps we can take to produce more information about the crash.

zh217 avatar Oct 25 '22 16:10 zh217

I'm having exact same problem with version 10.4.2. My writer is single thread but rest of the application uses many threads.

altunkan avatar Dec 07 '25 19:12 altunkan