rocksdb
rocksdb copied to clipboard
Occasional segfault in GetColumnFamilyID with default column family when adding a delete to write batch
I'm using RocksDB Java version 6.4.6 with Java 8 (java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64 on CentOS 7.6).
Having found a similar issue, I've checked and the DB has not been closed. In fact, I'm in the middle of a long string of database record deletes - this process happens in my application's startup code path.
The stack looks like this:
Stack: [0x00007fa24df43000,0x00007fa24e044000], sp=0x00007fa24e040b18, free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni8332868313935379620.so+0x21aa08] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8
C [librocksdbjni8332868313935379620.so+0x1eb75f]
C [librocksdbjni8332868313935379620.so+0x1eea84] Java_org_rocksdb_WriteBatch_delete__J_3BIJ+0x94
J 6999 org.rocksdb.WriteBatch.delete(J[BIJ)V (0 bytes) @ 0x00007fa285a06f86 [0x00007fa285a06ec0+0xc6]
J 8901 C2 com.primarydata.hmdbserver.db.RocksDBKeyValueDB.putOrRemove([BLcom/primarydata/hmdbserver/model/DbValue;)V (349 bytes) @ 0x00007fa285f44ddc [0x00007fa285f43ee0+0xefc]
J 6988 C1 com.primarydata.hmdbserver.db.RocksDBKeyValueDB$$Lambda$182.accept(Ljava/lang/Object;)V (24 bytes) @ 0x00007fa285a0a284 [0x00007fa285a09f80+0x304]
j com.primarydata.hmdbserver.gc.GarbageCollector.lambda$deleteChunkAndPublishStats$7(I[BLcom/primarydata/dme/cloud/rest/model/Metadata;Ljava/util/concurrent/atomic/AtomicInteger;Ljava/util/function/Consumer;Ljava/lang/Boolean;)V+117
j com.primarydata.hmdbserver.gc.GarbageCollector$$Lambda$186.accept(Ljava/lang/Object;)V+28
j java.util.concurrent.CompletableFuture.uniAccept(Ljava/util/concurrent/CompletableFuture;Ljava/util/function/Consumer;Ljava/util/concurrent/CompletableFuture$UniAccept;)Z+84
J 8856 C2 java.util.concurrent.CompletableFuture.postComplete()V (85 bytes) @ 0x00007fa285eabca8 [0x00007fa285eabae0+0x1c8]
j java.util.concurrent.CompletableFuture$AsyncSupply.run()V+57
j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x68364a]
V [libjvm.so+0x680ae4]
V [libjvm.so+0x6810f9]
V [libjvm.so+0x6c5721]
V [libjvm.so+0xa788a2]
V [libjvm.so+0x8beb82]
C [libpthread.so.0+0x7dd5] start_thread+0xc5
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 6999 org.rocksdb.WriteBatch.delete(J[BIJ)V (0 bytes) @ 0x00007fa285a06f0c [0x00007fa285a06ec0+0x4c]
J 8901 C2 com.primarydata.hmdbserver.db.RocksDBKeyValueDB.putOrRemove([BLcom/primarydata/hmdbserver/model/DbValue;)V (349 bytes) @ 0x00007fa285f44ddc [0x00007fa285f43ee0+0xefc]
J 6988 C1 com.primarydata.hmdbserver.db.RocksDBKeyValueDB$$Lambda$182.accept(Ljava/lang/Object;)V (24 bytes) @ 0x00007fa285a0a284 [0x00007fa285a09f80+0x304]
j com.primarydata.hmdbserver.gc.GarbageCollector.lambda$deleteChunkAndPublishStats$7(I[BLcom/primarydata/dme/cloud/rest/model/Metadata;Ljava/util/concurrent/atomic/AtomicInteger;Ljava/util/function/Consumer;Ljava/lang/Boolean;)V+117
j com.primarydata.hmdbserver.gc.GarbageCollector$$Lambda$186.accept(Ljava/lang/Object;)V+28
j java.util.concurrent.CompletableFuture.uniAccept(Ljava/util/concurrent/CompletableFuture;Ljava/util/function/Consumer;Ljava/util/concurrent/CompletableFuture$UniAccept;)Z+84
J 8856 C2 java.util.concurrent.CompletableFuture.postComplete()V (85 bytes) @ 0x00007fa285eabca8 [0x00007fa285eabae0+0x1c8]
j java.util.concurrent.CompletableFuture$AsyncSupply.run()V+57
j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
Expected behavior
No segfault.
Actual behavior
Segfaults sometimes - random, very intermittent.
Steps to reproduce the behavior
I'm opening my DB like this:
private RocksDBKeyValueDB(@Nonnull Path dbPath) {
initKeyLocks();
this.cfOptsDefault = new ColumnFamilyOptions();
this.cfOptsLongMerge = new ColumnFamilyOptions().setMergeOperatorName("uint64add");
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOptsDefault),
new ColumnFamilyDescriptor(UPLOADED_AT_CF_NAME, cfOptsLongMerge),
new ColumnFamilyDescriptor(ACCUMULATORS_CF_NAME, cfOptsLongMerge),
new ColumnFamilyDescriptor(SER_STATE_CF_NAME, cfOptsDefault),
new ColumnFamilyDescriptor(SITE_UPDATES_CF_NAME, cfOptsDefault)
);
this.options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true)
.setErrorIfExists(false)
.setKeepLogFileNum(5)
// this is the default, but keeping it here to remember that such a tunable exists
.setDeleteObsoleteFilesPeriodMicros(TimeUnit.HOURS.toMicros(6L))
.setLogFileTimeToRoll(TimeUnit.DAYS.toMillis(1L))
.setMaxOpenFiles(0)
.setMaxTotalWalSize(0);
try {
this.db = RocksDB.open(options, dbPath.toString(), cfDescriptors, handleTable);
this.writeOptions = new WriteOptions().setSync(true);
} catch (RocksDBException e) {
log.error("Failed to create db", e);
close();
throw new KeyValueDBException(e);
}
}
When the segfault happens, it occurs in this code:
void putOrRemove(@Nonnull byte[] keyb, @Nonnull DbValue value) {
byte[] serialized = DbValue.serialize(value);
Map<DbValue.KeyListType, List<byte[]>> uploadedAtChanges = value.getUploadedAtChanges(keyb);
try (final WriteBatch wb = new WriteBatch()) {
ColumnFamilyHandle uploadedAtCfh = handleTable.get(CF_IDX_UPLOADED_AT);
if (serialized == null) {
// remove value - key was removed from last osv
wb.delete(handleTable.get(CF_IDX_DEFAULT), keyb); <<< ---- here
} else {
...
CF_IDX_DEFAULT is defined as 0 (representing the default column family). As far as I know, it always happens only on the default column family.
The handleTable is defined as a class field as follows:
private final List<ColumnFamilyHandle> handleTable = new ArrayList<>();
Here is what I found. handleTable in RocksDB.open(options, dbPath.toString(), cfDescriptors, handleTable) does not include the default column handle. In his code, he used 0 index handle in handleTable, this was not the default handle. It can cause segment fault. Here is the source in java/src/main/java/org/rocksdb/RocksDB.java:311, final long[] handles = open(options.nativeHandle_, path, cfNames, cfOptionHandles); final RocksDB db = new RocksDB(handles[0]); db.storeOptionsInstance(options);
for (int i = 1; i < handles.length; i++) {
final ColumnFamilyHandle columnFamilyHandle = new ColumnFamilyHandle(db, handles[i]);
columnFamilyHandles.add(columnFamilyHandle);
}
The loop starts from 1, not 0. That means handles[0] is not the default column handle. If you look at the call stack. It crashed at Java_org_rocksdb_WriteBatch_delete__J_3BIJ. Here is the source code,
auto remove = [&wb, &cf_handle](ROCKSDB_NAMESPACE::Slice key) { return wb->Delete(cf_handle, key); }; ==> db/write_batch.cc Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key) { return WriteBatchInternal::Delete(this, GetColumnFamilyID(column_family), key); }
It crashed at GetColumnFamilyID, if you look at this function,
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); column_family_id = cfh->GetID();
I think cfh is null, this causes segment fault.
This has been passed to me to take a further look at. I can generate a similar crash, but haven't been able to build/run 6.4.6 (and there have been code changes in the crashing methods since that release) to confirm I'm seeing the exact same thing. And as the code you have provided is not a full repro, I'm guessing about what the state might be when wb.delete()
is called.
I'm not in total agreement with Si's analysis, my theory is that you are simply trying to use the column handle when it has been closed, which you should not do, even if error handling should be better; is there any chance that you @jcalcote could be doing this ? Any concurrency going on where another thread might be closing the column family handle, possibly in try-with-resources or a finally ?
Is this still an issue for you ? It's difficult to pin down the problem without a full code example. It would be really helpful if you could provide this, either something that fails on 6.4.6 or a more recent release.
Sorry you've probably answered some of these questions before for Si..
Hello Alan,
We do not believe we're closing the handle, because we open these handles once at program (server) startup and they never get closed until the program shuts down. This fault is occurring in the middle of a test run - nowhere near any shutdown code. Please let me know if I can help in any way.
Thanks, John
On Thu, Sep 22, 2022 at 9:18 AM Alan Paxton @.***> wrote:
This has been passed to me to take a further look at. I can generate a similar crash, but haven't been able to build/run 6.4.6 (and there have been code changes in the crashing methods since that release) to confirm I'm seeing the exact same thing. And as the code you have provided is not a full repro, I'm guessing about what the state might be when wb.delete() is called.
I'm not in total agreement with Si's analysis, my theory is that you are simply trying to use the column handle when it has been closed, which you should not do, even if error handling should be better; is there any chance that you @jcalcote https://github.com/jcalcote could be doing this ? Any concurrency going on where another thread might be closing the column family handle, possibly in try-with-resources or a finally ?
Is this still an issue for you ? It's difficult to pin down the problem without a full code example. It would be really helpful if you could provide this, either something that fails on 6.4.6 or a more recent release.
Sorry you've probably answered some of these questions before for Si..
— Reply to this email directly, view it on GitHub https://github.com/facebook/rocksdb/issues/6520#issuecomment-1255179273, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMB4WZHVOXB5TZTDAPS4DV7R2GBANCNFSM4LGAOYRA . You are receiving this because you were mentioned.Message ID: @.***>
Hi John,
Thanks for getting back to me. A self-contained program that reproduces the fault, even if I have to run it for a long time, would be ideal. Failing that, more details on what you're doing :-)
And if you have run it on a more recent RocksDB 7, do you see the same or a similar problem, I.e. crash with non-identical trace ?
Thanks, Alan
Hi John,
Further thoughts @jcalcote - could this be happening at the same time / for the same reason as the failovers in your issue https://github.com/facebook/rocksdb/issues/7288 ? Judging by where the code is crashing, I'm fairly sure that something has closed the column family, either individually or as part of cleanup. The crash is the first dereference the call makes of the pointer to that CF.
There's a design principle that we have used in the RocksDB Java API where we prefer performance to checking, but that does mean that accessing an already closed (by the Java API) CF may result in a SEGV in C++ rather than a Java exception.
One suggestion if this is causing continual problems would be to replace
wb.delete(handleTable.get(CF_IDX_DEFAULT), keyb);
in your code with
final ColumnFamilyHandle handle = handleTable.get(CF_IDX_DEFAULT);
if (!handle.isOwningHandle()) {
throw new Exception("Unexpectedly already closed column family handle at index " + CF_IDX_DEFAULT);
}
wb.delete(handle, keyb);
to detect all but the very smallest window where something is closing the column family (and in fact probably the database).
Closing this as no response from @jcalcote within a reasonable timeframe.