AliceVision icon indicating copy to clipboard operation
AliceVision copied to clipboard

Wrap OpenMP invocations in an interface to support other multi-threading backends in the future

Open p12tic opened this issue 1 year ago • 0 comments

Currently AliceVision uses OpenMP as the only multi-threading backend. This is problematic due to multiple reasons.

  • OpenMP support is not uniform among compilers. In particular, Apple mobile platforms do not support OpenMP. As a result, the performance of the algorithms is as much as 8 times lower than possible.

  • OpenMP critical sections are global. That is, all #pragma omp critical lock the same global mutex. As a consequence it is inefficient to run multiple instances of the same parallelized aliceVision algorithm because these instances will share the same mutex even though data races are possible only among threads running single instance of the algorithm. Ideally each instance would have its own mutex.

  • It is not possible to efficiently integrate third-party libraries that use another multi-threading framework because OpenMP assumes that it is the only user of the CPU. As a result, the CPU will be oversubscribed which leads to poor performance. Note that as currently used OpenMP will oversubscribe the CPU all by itself even right now if multiple instances of the same parallelized algorithm are invoked in parallel.

This PR takes inspiration from OpenCV to hide the usage of multi-threading framework behind an API. This will eventually allow supporting multiple multi-threading frameworks. For more details in how it works in OpenCV, see this document.

This PR implements the following:

  • migrate off OpenMP synchronization primitives to standard mutexes, atomics and boost::atomic_ref (once we can use C++20 we can migrate to std::atomic_ref).
  • move pragma omp parallel uses into a single cpp file by wrapping them by system::parallelFor and system::parallelLoop functions.

As a result the OpenMP code can be converted as follows. For example:

#pragma omp parallel for
for (int i = 10; i < size; ++i)
{
    doStuff(i);
}

Equivalent implementation of this loop using system::parallelFor is the following:

system::parallelFor(10, size, [&](int i)
{
    doStuff(i);
});

If this PR is approved, long term plan could be to add support for 2 more multi-threading frameworks (e.g. Intel TBB and Apple GCD) and also expose a user-level plugin API to hook their own implementation.

This PR includes #1234 and #1235. It's best reviewed commit by commit.

The PR is split into a large number of commits to allow easy bisection in case a bug slips through. As a result the risk of the PR is low as any bugs will be easily diagnosed and fixed.

p12tic avatar Sep 23 '22 00:09 p12tic