OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Thread callback for OpenMP backend

Open jeremiedbb opened this issue 8 months ago • 9 comments

Hi,

I'm trying to leverage #4577 in a project (scikit-learn) that has a mix of OpenMP and OpenBLAS built with the pthreads threading layer to make OpenBLAS use the OpenMP threadpool. Ideally we'd use an OpenBLAS built with the OpenMP threading layer but it's outside of our control because it comes from a dependency.

I naively tried the example callback presented in https://github.com/OpenMathLib/OpenBLAS/pull/4577#issue-2204960832, but I can't make it work. Here's a simple reproducer, just a gemm:

test.c

# include <stdio.h>
# include <stdlib.h>
# include <stddef.h>
# include <omp.h>
# include <cblas.h>


void omp_cb (int sync, openblas_dojob_callback dojob, int numjobs, size_t jobdata_elsize, void *jobdata, int dojob_data)
{
    #pragma omp parallel for
    for(int i = 0; i < numjobs; i++)
    {
        printf("thread: %d, i: %d\n", omp_get_thread_num(), i);
        void *element_adrr = (void *) (((char *)jobdata) + ((unsigned) i)*jobdata_elsize);
        dojob(i, element_adrr, dojob_data);
    }
    return;
}


void test()
{
    int n = 100;

    double *A = (double *)malloc(n * n * sizeof(double));
    double *B = (double *)malloc(n * n * sizeof(double));
    double *C = (double *)malloc(n * n * sizeof(double));

    for(int i = 0; i < n * n; i++)
    {
        A[i] = 1.0;
        B[i] = 1.0;
        C[i] = 0.0;
    }

    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0, A, n, B, n, 0.0, C, n);

    free(A);
    free(B);
    free(C);
}


int main()
{
    openblas_set_threads_callback_function(omp_cb);
    test();
}

Compile command:

gcc -o test test.c -fopenmp -I/home/jeremie/R/installs/OpenBLAS/include -Wl,-rpath,/home/jeremie/R/installs/OpenBLAS/lib -L/home/jeremie/R/installs/OpenBLAS/lib -lopenblas

It just results in a segfault at the first step of the loop. Note that it still segfaults if I remove the omp pragma and just use a sequential loop in the callback.

Any help would be greatly appreciated.

jeremiedbb avatar Jun 28 '24 09:06 jeremiedbb