store icon indicating copy to clipboard operation
store copied to clipboard

Flaky deadlock during shutdown

Open codeleverage opened this issue 8 months ago • 1 comments

Environment Details

  • EclipseStore Version: 2.1.3
  • JDK version: 21.0.1
  • OS: Windows 11
  • Used frameworks: JUnit 5

Describe the bug

Calling EmbeddedStorageManager.close() sometimes hangs because of a deadlock.

To Reproduce

I have not been able to reliably reproduce this. I have a bunch of unit tests that shutdown and start a new storage manager to verify that objects were actually persisted. Occasionally the shutdown in a random test will hang indefinitely.

Expected behavior

Success or exception, but not deadlock.

Additional context

I was able to find some details of the deadlock using jstack:

Found one Java-level deadlock:
=============================
"Eclipse Serializer-StorageChannel-0":
  waiting to lock monitor 0x00000257712e9f90 (object 0x000000043bfc4318, a org.eclipse.store.storage.types.StorageBackupItemQueue$Default$Item),
  which is held by "Eclipse Serializer-StorageBackupHandler"

"Eclipse Serializer-StorageBackupHandler":
  waiting to lock monitor 0x000002576d396ec0 (object 0x000000043b4caaa0, a org.eclipse.store.storage.types.StorageLiveDataFile$Default),
  which is held by "Eclipse Serializer-StorageChannel-0"

Java stack information for the threads listed above:
===================================================
"Eclipse Serializer-StorageChannel-0":
        at org.eclipse.store.storage.types.StorageBackupItemQueue$Default.internalEnqueueItem(StorageBackupItemQueue.java:96)
        - waiting to lock <0x000000043bfc4318> (a org.eclipse.store.storage.types.StorageBackupItemQueue$Default$Item)
        at org.eclipse.store.storage.types.StorageBackupItemQueue$Default.enqueueCopyingItem(StorageBackupItemQueue.java:67)
        at org.eclipse.store.storage.types.StorageFileWriterBackupping$Default.writeTransactionEntryDelete(StorageFileWriterBackupping.java:187)
        at org.eclipse.store.storage.types.StorageFileManager$Default.writeTransactionsEntryFileDeletion(StorageFileManager.java:1248)
        at org.eclipse.store.storage.types.StorageFileManager$Default.deleteFile(StorageFileManager.java:1566)
        at org.eclipse.store.storage.types.StorageFileManager$Default.deletePendingFile(StorageFileManager.java:1433)
        at org.eclipse.store.storage.types.StorageFileManager$Default$$Lambda/0x000002572840f5a8.accept(Unknown Source)
        at org.eclipse.store.storage.types.StorageLiveDataFile$Default.lambda$executeIfUnsuedData$0(StorageLiveDataFile.java:257)
        at org.eclipse.store.storage.types.StorageLiveDataFile$Default$$Lambda/0x0000025728630978.accept(Unknown Source)
        at org.eclipse.store.storage.types.StorageLiveFile$Abstract.executeIfUnsued(StorageLiveFile.java:118)
        - locked <0x000000043b4caaa0> (a org.eclipse.store.storage.types.StorageLiveDataFile$Default)
        at org.eclipse.store.storage.types.StorageLiveDataFile$Default.executeIfUnsuedData(StorageLiveDataFile.java:256)
        at org.eclipse.store.storage.types.StorageFileManager$Default.internalCheckForCleanup(StorageFileManager.java:1465)
        at org.eclipse.store.storage.types.StorageFileManager$Default.incrementalFileCleanupCheck(StorageFileManager.java:1397)
        at org.eclipse.store.storage.types.StorageChannel$Default.lambda$performFileCleanupCheck$0(StorageChannel.java:315)
        at org.eclipse.store.storage.types.StorageChannel$Default$$Lambda/0x0000025728425038.perform(Unknown Source)
        at org.eclipse.store.storage.types.StorageChannelHousekeepingResult.create(StorageChannelHousekeepingResult.java:33)
        at org.eclipse.store.storage.types.StorageChannel$Default.performFileCleanupCheck(StorageChannel.java:314)
        at org.eclipse.store.storage.types.StorageHousekeepingBroker$Default.performFileCleanupCheck(StorageHousekeepingBroker.java:113)
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeepingCheckFileCleanup(StorageChannel.java:412)
        at org.eclipse.store.storage.types.StorageChannel$Default$$Lambda/0x00000257284103e8.perform(Unknown Source)
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeeping(StorageChannel.java:248)
        at org.eclipse.store.storage.types.StorageChannel$Default.work(StorageChannel.java:479)
        at org.eclipse.store.storage.types.StorageChannel$Default.run(StorageChannel.java:536)
        at java.lang.Thread.runWith([email protected]/Thread.java:1596)
        at java.lang.Thread.run([email protected]/Thread.java:1583)
"Eclipse Serializer-StorageBackupHandler":
        at org.eclipse.store.storage.types.StorageFile$Abstract.close(StorageFile.java)
        - waiting to lock <0x000000043b4caaa0> (a org.eclipse.store.storage.types.StorageLiveDataFile$Default)
        at org.eclipse.store.storage.types.StorageLiveFile$Abstract.unregisterUsageClosing(StorageLiveFile.java:217)
        at org.eclipse.store.storage.types.StorageBackupItemQueue$Default.processNextItem(StorageBackupItemQueue.java:135)
        - locked <0x000000043bfc4318> (a org.eclipse.store.storage.types.StorageBackupItemQueue$Default$Item)
        at org.eclipse.store.storage.types.StorageBackupHandler$Default.run(StorageBackupHandler.java:270)
        at java.lang.Thread.runWith([email protected]/Thread.java:1596)
        at java.lang.Thread.run([email protected]/Thread.java:1583)

Found 1 deadlock.

And here's the relevant part of the main thread's stack trace:

org.eclipse.store.storage.types.StorageChannelTask$Abstract.waitOnCompletion(StorageChannelTask.java:166)
org.eclipse.store.storage.types.StorageSystem$Default.internalShutdown(StorageSystem.java:553)
org.eclipse.store.storage.types.StorageSystem$Default.shutdown(StorageSystem.java:647)
org.eclipse.store.storage.embedded.types.EmbeddedStorageManager$Default.shutdown(EmbeddedStorageManager.java:408)
org.eclipse.store.storage.types.StorageController.close(StorageController.java:70)

codeleverage avatar Apr 18 '25 12:04 codeleverage

Many thanks for that stack-trace. This is a bug that must be investigated.

Looks like housekeeping and backup are not correctly synchronized if a obsolete file should be removed…

hg-ms avatar Apr 22 '25 08:04 hg-ms