Add "dynamic" parallelization strategies.

Open gonnet opened this issue 1 year ago • 1 comments

This change adds the following parallelization functions:

pthreadpool_parallelize_1d_dynamic,
pthreadpool_parallelize_2d_dynamic_1d,
pthreadpool_parallelize_2d_dynamic, and
pthreadpool_parallelize_3d_dynamic_2d.

These functions are similar to the pthreadpool_parallelize_Xd_tile_Yd functions, but differ in that the tile_* values are not bounds, but instead preferred multiples.

For example, pthreadpool_parallelize_1d_dynamic(threadpool, task, context, range, tile, flags) calls the user-provided task function

task(context, offset, count)

where

offset is in the range [0, range) and an integer multiple of tile, and
count is an integer multiple of tile unless offset + count = range.

The tile parameter is understood as a preferred multiple of indices, and not as an upper bound.

The counts are chosen such as to minimize the number of calls to task while still balancing the workload across all threads.

Under the hood, each thread tries to reserve a "chunk" of tiles corresponding to

size_t chunk_size = max(num_tasks / (2 * num_threads), 1);
chunk_size = ((chunk_size + tile - 1) / tile) * tile;

i.e. the number of remaining tasks divided by 2x the number of threads, rounded up to the next multiple of tile. This balances well provided the speed difference between the fastest and slowest threads does not exceed a factor of 2.

Sep 04 '24 13:09 gonnet

Hi @Maratyszcza,

Please let me know if you have the time to review this PR or not, or whether you are still accepting PRs or not :relaxed:

Cheers, Pedro

Sep 10 '24 12:09 gonnet