OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

WIP: allow threading backend to be replaced by caller

Open stevengj opened this issue 5 years ago • 13 comments

This PR adds a new function openblas_set_threads_callback that allows the caller to register a callback in order to change the threading backend at runtime — instead of spawning its own threads and passing work to them, OpenBLAS will pass an array of work (an array of blas_queue_t) to the callback, which can execute the work in serial or in parallel using whatever mechanism it wants.

Motivation: this allows OpenBLAS threading to be composable with caller multithreading, instead of having OpenBLAS and the caller fight over the same CPUs. In particular, this enables OpenBLAS threading to be composable with:

Because registering the callback is done at runtime, people wanting to use this feature won't need to install a separate compiled OpenBLAS binary, but can instead use an existing threaded binary.

The design and implementation is very similar to the new pluggable threading backend for FFTW (FFTW/fftw3#175) and Blosc (Blosc/c-blosc2#81), which worked out very well for us. My expectation is that the code additions will be fairly minimal—a few dozen lines plus some trivial refactoring—with a practical overhead of a single if statement in the exec_blas function for people not using this feature.

This PR is an early work-in-progress, but I wanted to get some early feedback. To do:

  • [ ] Testing
  • [ ] Add to pthreads backend (blas_server.c)
  • [ ] Add to win32 backend (blas_server_win32.c)
  • [ ] Documentation

stevengj avatar Sep 12 '19 19:09 stevengj

In particular, I'm thinking the prototypical user callback could look like:

void callback(void *callback_data, int sync, void (*dojob)(int, void *, void *), int numjobs, size_t jobdata_elsize, void *jobdata, void *dojob_data) {
    int i;
    for (i = 0; i < numjobs; ++i) // in parallel
        dojob(THREADID, ((char *) jobdata) + ((unsigned) i)*jobdata_elsize, dojob_data);
    if (sync)
     // wait for dojob calls to complete
}

where the for loop is executed in parallel in some way determined by the caller, and THREADID is some caller code for determining the current thread number (0 to nthreads-1, where nthreads was the value passed to omp_set_num_threads).

stevengj avatar Sep 12 '19 19:09 stevengj

(Note that this will also allow people using OpenMP to link to a pthreads OpenBLAS binary without recompiling OpenBLAS — with a 10-line callback using #pragma omp parallel for, they can tell the pthreads OpenBLAS to use their OpenMP threads instead of its own pthreads.)

stevengj avatar Sep 12 '19 19:09 stevengj

That's an intriguing concept... I just hope it does not wake any sleeping bugs.

martin-frbg avatar Sep 12 '19 21:09 martin-frbg

Is further work required to move this forward? Thanks for pushing composable threading, @stevengj!

StefanKarpinski avatar Sep 20 '19 16:09 StefanKarpinski

IF this is expected to work on Win32, it will need adjustments in blas_server_win32.c similar to those in/for blas_server_omp.c (this is what causes one of the two remaining Travis failures, the other is an unrelated issue with powerpc)

martin-frbg avatar Sep 20 '19 20:09 martin-frbg

Is further work required to move this forward?

Yes, a significant amount of work: see un-checked to-do items above. It looks like the most effort is required to adapt blas_server.c, which I'm currently working on now.

I haven't had a chance to work on it for a few days, but I'm hoping to get back to it next week.

stevengj avatar Sep 21 '19 14:09 stevengj

Right, somehow I missed the checklist at the bottom of your post 😬

StefanKarpinski avatar Sep 21 '19 17:09 StefanKarpinski

@stevengj: I haven't had a chance to work on it for a few days, but I'm hoping to get back to it next week.

Any update on this? :)

h-vetinari avatar Mar 10 '20 07:03 h-vetinari

@stevengj 18 month later, just checking in where things stand on this... 🙃

h-vetinari avatar Sep 30 '21 13:09 h-vetinari

Would be great to get this done!

StefanKarpinski avatar Sep 30 '21 14:09 StefanKarpinski

Looks very interesting. Do you think it would provide a solution to this issue https://github.com/xianyi/OpenBLAS/issues/3187 ?

jeremiedbb avatar Sep 30 '21 14:09 jeremiedbb

rebased for convenience (it is not immediately clear to me how to apply the proposed changes to the non-OMP server files) 2255rebased.txt

martin-frbg avatar Oct 03 '21 11:10 martin-frbg

@stevengj Any update on this? Can you please share your email id, would like to discuss few doubts. Your help is highly appreciated.

goplanid avatar Jul 12 '23 19:07 goplanid

closing as superseded by #4577 - nevertheless an important contribution, and I regret not having been able to advance it at the time

martin-frbg avatar Apr 23 '24 21:04 martin-frbg