fdb-record-layer icon indicating copy to clipboard operation
fdb-record-layer copied to clipboard

The Lucene index blocks in an asynchronous context frequently

Open scgray opened this issue 3 years ago • 0 comments

While reviewing https://github.com/FoundationDB/fdb-record-layer/pull/1309, I noticed that the Lucene index uses join() relatively frequently (particularly in FDBDrectory). A first note is that, with the exception of tests, Record Layer code should never use join() or get() to block on a future, but should instead use FDBRecordContext.asyncToSync or FDBDatabase.asyncToSync(). This methods provide some critical functionality to Record Layer:

  • They establish a configurable time limit on how long we are willing to wait for the operation
  • They automatically define a wait timer that will be included in the FDBStoreTimer to track the time spent waiting
  • They include automatic error handling and wrapping to promote thrown exceptions to the appropriate RecordCoreException, as necessary
  • They have the ability to detect if asyncToSync() is being called from within a completing future

This last bullet is the key one.

Blocking code must never be called within a CompletableFuture, which means that join(), get(), and asyncToSync() cannot ever be called within one. The problem is that there are a limited number of threads that are used to invoke the lambda in the future when it completes, so if you have a future that spawns a bunch of async work, then, within the same future, waits on it to complete, then you have now tied up that thread while it waits. For the work that it is waiting on to complete, a thread must be available to complete it, so if you get enough threads waiting like this you can starve the thread pool and basically get a deadlock where the whole world freezes.

In https://github.com/FoundationDB/fdb-record-layer/pull/1309 I asked for the join() to be changed to asyncToSync(). I then, locally turned on block-in-async detection by changing fdb-record-layer-lucene.gradle to:

test {
    useJUnitPlatform {
        ...
        // Configure whether or not tests will validate that asyncToSync isn't being called in async
        // context.  See BlockingInAsyncDetection class for details on values.
        systemProperties = [ "com.apple.foundationdb.record.blockingInAsyncDetection":
            System.getenv('BLOCKING_DETECTION') ?: "IGNORE_COMPLETE_EXCEPTION_BLOCKING" ]
    }
}

Now, many of the lucene tests file, for example:

    com.apple.foundationdb.record.provider.foundationdb.BlockingInAsyncException: Blocking in an asynchronous context
        at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.logOrThrowBlockingInAsync(FDBDatabase.java:1263)
        at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.checkIfBlockingInFuture(FDBDatabase.java:1246)
        at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.asyncToSync(FDBDatabase.java:1057)
        at com.apple.foundationdb.record.provider.foundationdb.FDBRecordContext.asyncToSync(FDBRecordContext.java:1019)
        at com.apple.foundationdb.record.lucene.directory.FDBDirectory.deleteFile(FDBDirectory.java:319)
        at org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
        at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:696)
        at org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:690)
        at org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:664)
        at org.apache.lucene.index.IndexWriter.deleteNewFiles(IndexWriter.java:5053)
        at org.apache.lucene.index.IndexWriter.access$200(IndexWriter.java:213)
        at org.apache.lucene.index.IndexWriter$1.lambda$deleteUnusedFiles$0(IndexWriter.java:357)
        at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5135)
        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3627)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3586)
        at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1036)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1079)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:88)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76)
        at com.apple.foundationdb.record.lucene.IndexWriterCommitCheckAsync.lambda$checkAsync$0(IndexWriterCommitCheckAsync.java:99)

scgray avatar Jul 15 '21 10:07 scgray