rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

JNI: SIGSEGV when opening and writing to a previously closed database

Open errikos opened this issue 1 year ago • 2 comments

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

Opening and writing to a previously closed+destroyed database should succeed.

Actual behavior

Opening a previously closed+destroyed database and writing to it always results in a SIGSEGV at the same point (when checking whether a key exists in the DB before insertion).

Steps to reproduce the behavior

RocksDBJNI version 9.6.1

I have been trying to debug this issue for the last 3 days without any luck.

I have some JUnit tests that do the following:

  • Open a DB (creating it if it does not exist).
  • Carry out some tests.
  • Close and destroy the DB.

The flow is very simple:

    protected val rocksDb: RocksDB
    private val rocksDbOptions: org.rocksdb.Options = org.rocksdb.Options()
        .setCreateIfMissing(Configuration.CREATE_DATABASE_FILE_IF_MISSING)

    // open the DB - called by JUnit during a method with @BeforeAll
    init {
        RocksDB.loadLibrary()
        Files.createDirectories(rocksDbFile.absoluteFile.toPath())
        rocksDb = RocksDB.open(rocksDbOptions, rocksDbFile.absolutePath)
    }

    // close the DB - called by JUnit during a method with @AfterAll
    fun closeAllDatabases() {
        val dbToClose = listOf(
            getKoinInstance<DB1>(),
            getKoinInstance<DB2>(),
        )

        for (db in dbToClose) {
            db.close()
            if (db.rocksDbFile.exists()) {
                RocksDB.destroyDB(db.rocksDbFile.absolutePath, org.rocksdb.Options())
            }
        }
    }

    // insert function
    open fun insert(key: K, value: V): V {
        val serializedKey = SerializationUtils.serialize(key)
        val serializedValue = SerializationUtils.serialize(value)

        if (rocksDb.keyExists(serializedKey)) {
            logger.warn("RocksDB: key already exists: $key")
        }

        rocksDb.put(serializedKey, serializedValue)
        return value
    }

When running each unit test suite separately and/or multiple times, everything looks OK (even if I don't delete the database files in between).

But, as soon as I run two suites that do the same initialization within the same JUnit run, the following happens:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff122e86efc, pid=3082460, tid=3082756
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.4+7 (21.0.4+7) (build 21.0.4+7-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (21.0.4+7-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [librocksdbjni1151386142668908978.so+0x106fefc]  key_exists_helper(JNIEnv_*, long, long, long, char*, int)+0x68
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Dorg.gradle.internal.worker.tmpdir=/tmp/test/work -javaagent:/build/kover/kover-jvm-agent-0.8.3.jar=file:tmp/test/kover-agent.args -Xmx512m -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant -ea worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test Executor 1'

Host: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, 16 cores, 46G, Red Hat Enterprise Linux release 8.6 (Ootpa)
Time: Thu Sep 26 13:04:09 2024 CEST elapsed time: 10.644058 seconds (0d 0h 0m 10s)

---------------  T H R E A D  ---------------

Current thread (0x00007ff0ec1541c0):  JavaThread "vert.x-eventloop-thread-1"        [_thread_in_native, id=3082756, stack(0x00007ff121516000,0x00007ff121616000) (1024K)]

Stack: [0x00007ff121516000,0x00007ff121616000],  sp=0x00007ff1216117b0,  free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni1151386142668908978.so+0x106fefc]  key_exists_helper(JNIEnv_*, long, long, long, char*, int)+0x68  (rocksjni.cc:1912)
C  [librocksdbjni1151386142668908978.so+0x10701c6]  Java_org_rocksdb_RocksDB_keyExists+0xa5  (rocksjni.cc:1966)
j  org.rocksdb.RocksDB.keyExists(JJJ[BII)Z+0
j  org.rocksdb.RocksDB.keyExists(Lorg/rocksdb/ColumnFamilyHandle;Lorg/rocksdb/ReadOptions;[BII)Z+124
j  org.rocksdb.RocksDB.keyExists([BII)Z+20
j  org.rocksdb.RocksDB.keyExists([B)Z+17
j  repository.RocksDbRepo.insert(Ljava/io/Serializable;Ljava/io/Serializable;)Ljava/io/Serializable;+51
j  repository.CachingRocksDbRepo.insert(Ljava/io/Serializable;Ljava/io/Serializable;)Ljava/io/Serializable;+26
$ addr2line -e /tmp/librocksdbjni1151386142668908978.so 0x106fefc
rocksdb/java/rocksjni/rocksjni.cc:1912

The issue seems to be with the db pointer in rocksjni.cc#L1912, but I can't trace it back to anything being problematic in my flow.

Any insights on this would be appreciated.

errikos avatar Sep 26 '24 11:09 errikos

But, as soon as I run two suites that do the same initialization within the same JUnit run, the following happens:

@errikos This sounds like you have a concurrency issue or an issue with state not being cleaned up.

adamretter avatar Sep 26 '24 11:09 adamretter

Thanks for the reply @adamretter.

There is no concurrent access to the DB during the tests. My close method looks like this:

    override fun close() {
        if (!rocksDb.isClosed) {
            val flushOptions = FlushOptions().setWaitForFlush(true)
            rocksDb.flush(flushOptions)
            rocksDb.close()
            rocksDbOptions.close()
            flushOptions.close()
        }
    }

In fact, the first test suite runs and exists correctly (with the DB closed and cleaned).

It is only after the second suite starts and tries to do the insertion that the problem arises (therefore it is able to recreate and open the DB).

errikos avatar Sep 26 '24 11:09 errikos

Hi @errikos - apologies for the long delay in response. @adamretter asked me to take another look.

Are the 2 suites using completely separate database directories ? I wonder if destroyDB is complete when it returns (files may still be pending delete, perhaps) and whether this affects the next run of the test ? I will inspect things a bit more and see if I can come up with a more rigorous diagnosis, but it is difficult to be sure of any issue unless you could provide a self-contained reproduction ?

alanpaxton avatar May 14 '25 08:05 alanpaxton

Hello @alanpaxton,

No worries, this was a race condition due to leftover Vert.x threads after the end of each JUnit run.

We resolved this by waiting for all threads to join in the suite teardown method.

Nothing to do with RocksDB, sorry for the dummy issue.

errikos avatar May 14 '25 08:05 errikos