celix icon indicating copy to clipboard operation
celix copied to clipboard

Race Condition and ABBA Deadlock

Open PengZheng opened this issue 2 years ago • 2 comments

Checking whether a bundle is active in celix_bundleContext_useBundles callback suffers from check-then-act race condition. This race condition is a minor one, but immediate uninstall of a bundle after framework startup may crash the program.

PengZheng avatar Sep 19 '23 05:09 PengZheng

Codecov Report

Merging #645 (0c2c973) into master (91ff81c) will increase coverage by 0.02%. The diff coverage is 100.00%.

:exclamation: Current head 0c2c973 differs from pull request most recent head beb910e. Consider uploading reports for the commit beb910e to get more accurate results

@@            Coverage Diff             @@
##           master     #645      +/-   ##
==========================================
+ Coverage   81.58%   81.61%   +0.02%     
==========================================
  Files         260      260              
  Lines       34679    34678       -1     
==========================================
+ Hits        28294    28303       +9     
+ Misses       6385     6375      -10     
Files Changed Coverage Δ
libs/framework/src/dm_dependency_manager_impl.c 83.04% <100.00%> (+2.09%) :arrow_up:

... and 3 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 19 '23 05:09 codecov-commenter

This PR manifests a deadlock in test_components_ready. ~~More investigation is needed.~~

It turns out to be an insidious ABBA deadlock:

  1. When celix_componentsReadyCheck_destroy is called, ready_check's fsmMutex is locked, and celix_componentsReadyCheck_destroy will try to use the Celix event loop.
  2. When celix_componentReadyCheck_check calls celix_dependencyManager_allComponentsActive, which in turn tries to check ready_check's readiness, the Celix event loop is occupied and the current thread tries to acquire ready_check's fsmMutex.

We need to work out a fix for this deadlock before fixing the race condition. @pnoltes ~~A proper fix might be to exclude ready_check from readiness checking so that ready_check's fsmMutex don't have to be acquired for that purpose. For example, the following interface should allow us to deselect ready_check:~~

CELIX_FRAMEWORK_EXPORT size_t celix_framework_useActiveSelectedBundles(celix_framework_t* fw,
                                                                       bool includeFrameworkBundle,
                                                                       void* callbackHandle,
                                                                       bool (*select)(void* handle, const celix_bundle_t* bnd),
                                                                       void (*use)(void* handle, const celix_bundle_t* bnd));

~~More generally, useBundlesWithOptions could be added.~~

~~Such API changes should be postponed after 2.4.0.~~

The above proposed fix will not work, because the same ABBA deadlock could happen for other bundles when stopping them.

For now I'll leave this PR open as a reminder.

PengZheng avatar Sep 19 '23 05:09 PengZheng