fdb-record-layer
fdb-record-layer copied to clipboard
The Lucene index blocks in an asynchronous context frequently
While reviewing https://github.com/FoundationDB/fdb-record-layer/pull/1309, I noticed that the Lucene index uses join()
relatively frequently (particularly in FDBDrectory
). A first note is that, with the exception of tests, Record Layer code should never use join()
or get()
to block on a future, but should instead use FDBRecordContext.asyncToSync
or FDBDatabase.asyncToSync()
. This methods provide some critical functionality to Record Layer:
- They establish a configurable time limit on how long we are willing to wait for the operation
- They automatically define a wait timer that will be included in the
FDBStoreTimer
to track the time spent waiting - They include automatic error handling and wrapping to promote thrown exceptions to the appropriate
RecordCoreException
, as necessary - They have the ability to detect if
asyncToSync()
is being called from within a completing future
This last bullet is the key one.
Blocking code must never be called within a CompletableFuture
, which means that join()
, get()
, and asyncToSync()
cannot ever be called within one. The problem is that there are a limited number of threads that are used to invoke the lambda in the future when it completes, so if you have a future that spawns a bunch of async work, then, within the same future, waits on it to complete, then you have now tied up that thread while it waits. For the work that it is waiting on to complete, a thread must be available to complete it, so if you get enough threads waiting like this you can starve the thread pool and basically get a deadlock where the whole world freezes.
In https://github.com/FoundationDB/fdb-record-layer/pull/1309 I asked for the join()
to be changed to asyncToSync()
. I then, locally turned on block-in-async detection by changing fdb-record-layer-lucene.gradle
to:
test {
useJUnitPlatform {
...
// Configure whether or not tests will validate that asyncToSync isn't being called in async
// context. See BlockingInAsyncDetection class for details on values.
systemProperties = [ "com.apple.foundationdb.record.blockingInAsyncDetection":
System.getenv('BLOCKING_DETECTION') ?: "IGNORE_COMPLETE_EXCEPTION_BLOCKING" ]
}
}
Now, many of the lucene tests file, for example:
com.apple.foundationdb.record.provider.foundationdb.BlockingInAsyncException: Blocking in an asynchronous context
at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.logOrThrowBlockingInAsync(FDBDatabase.java:1263)
at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.checkIfBlockingInFuture(FDBDatabase.java:1246)
at com.apple.foundationdb.record.provider.foundationdb.FDBDatabase.asyncToSync(FDBDatabase.java:1057)
at com.apple.foundationdb.record.provider.foundationdb.FDBRecordContext.asyncToSync(FDBRecordContext.java:1019)
at com.apple.foundationdb.record.lucene.directory.FDBDirectory.deleteFile(FDBDirectory.java:319)
at org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:696)
at org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:690)
at org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:664)
at org.apache.lucene.index.IndexWriter.deleteNewFiles(IndexWriter.java:5053)
at org.apache.lucene.index.IndexWriter.access$200(IndexWriter.java:213)
at org.apache.lucene.index.IndexWriter$1.lambda$deleteUnusedFiles$0(IndexWriter.java:357)
at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5135)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3627)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3586)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1036)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1079)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:88)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:76)
at com.apple.foundationdb.record.lucene.IndexWriterCommitCheckAsync.lambda$checkAsync$0(IndexWriterCommitCheckAsync.java:99)