Multiple LevelDB instances in same process will cause a .sst file being locked
Here is the test
@Test
void levelDb(@TempDir(cleanup = CleanupMode.NEVER) Path tmp) throws Exception {
String clusterName = "node-cluster", dir = tmp.toString();
Map<String, String> props = Map.of("log_class", "org.jgroups.protocols.raft.LevelDBLog");
Consumer<ProtocolStackConfigurator> customizer = t -> configProtocol(t, "raft.RAFT", props);
// first
List<RaftNode> nodes = raftChannels("A,B,C", dir, customizer).stream().map(RaftNode::new).toList();
for (RaftNode node : nodes) node.getCh().connect(clusterName);
for (RaftNode node : nodes) node.close();
assertTrue(untilDeletable(tmp, 3));
// second
nodes = raftChannels("A,B,C", dir, customizer).stream().map(RaftNode::new).toList();
for (RaftNode node : nodes) node.getCh().connect(clusterName);
for (RaftNode node : nodes) node.close();
assertFalse(untilDeletable(tmp, 3));
// third
nodes = raftChannels("A,B,C", dir, customizer).stream().map(RaftNode::new).toList();
for (RaftNode node : nodes) {
assertThrows(DBException.class, () -> node.getCh().connect(clusterName)).printStackTrace();
}
}
After first wave of channels being closed, the log dir could be deleted, it can't after second wave, and create new log third time will throw an exception:
org.iq80.leveldb.DBException: IO error: ...\C.log\000005.sst: Could not create random access file.
at org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:90)
at org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:77)
at org.jgroups.protocols.raft.LevelDBLog.isANewRAFTLog(LevelDBLog.java:366)
at org.jgroups.protocols.raft.LevelDBLog.init(LevelDBLog.java:50)
at org.jgroups.protocols.raft.RAFT.start(RAFT.java:564)
at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:909)
at org.jgroups.JChannel.startStack(JChannel.java:914)
at org.jgroups.JChannel._preConnect(JChannel.java:792)
at org.jgroups.JChannel.connect(JChannel.java:323)
at org.jgroups.JChannel.connect(JChannel.java:317)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: ...\C.log\000005.sst: Could not create random access file.
at org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:316)
at org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:300)
at org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:293)
at org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:85)
Thanks for reporting! I'll try to look today as well. We intend to eventually phase out the LevelDB in favor of our FileBasedLog implementation. Likely, I'll update these defaults going into 2.0 once I increase the coverage. Nevertheless, I'll look to guarantee LevelDB is fine in 1.x.
@yfei-z, I am unable to reproduce the issue. I've created the cluster with three nodes, wrote a few entries, and closed the channels. I am doing this in a while loop but no issues during the restart. I've tried both deleting the RAFT folder after each loop and leaving the folder as is. But didn't trigger the exception in any case.
Some thoughts. Do you write entries during the test? Are you deleting the folders after each close? And is the folder really nuked with subfolders and files? I thought some files might be lying around after the deletion and causing the issue.
The test creates 3 nodes cluster and no commands, close all channels, leave the logs there and recreate the protocol stack of each node to restart the cluster(the same protocol stack can't be started again because of the LevelDB), after second time of the cluster started the logs folder can't be deleted any more.
I rerun the test in the docker container of Linux(ubi9), it can't reproduce, I think it only happened on Windows.
I found it. I've seen this when working with RocksDB in the past. Everything must be manually closed, but they are not AutoCloseable to give a hint/warning. We're missing the close to the DBIterator call on:
https://github.com/jgroups-extras/jgroups-raft/blob/29f2ff92b762ed0d6b2fdbbd5114494f70689d58/src/org/jgroups/protocols/raft/LevelDBLog.java#L217-L224
I've created a PR with fixes for LevelDB and FileBasedBased log.