sleef icon indicating copy to clipboard operation
sleef copied to clipboard

Feature detection is not thread-safe

Open Andreas-Krebbel opened this issue 1 year ago • 2 comments

On S/390 I ran into problem with the way the VXE2 hardware feature is currently being detected in multi-threaded code. It uses a SIGILL signal handler and sigjmp/longjmp in order to detect whether a certain instruction is available or not. Although I've looked into it for S/390 this should apply to other platforms using the same mechanism.

The problem is triggered by the way PyTorch is using Sleef: https://github.com/pytorch/pytorch/issues/128503

But it can also be reproduced with a small example like this:

#include <stdio.h>
#include "sleef.h"

#ifndef NUM_THREADS
#define NUM_THREADS 4
#endif

int main() {
  __vector float out[NUM_THREADS];

#pragma omp parallel for num_threads(NUM_THREADS)
  for (int i = 0; i < NUM_THREADS; i++)
    out[i] = Sleef_expf4_u10 ((__vector float){1.0f + i, 2.0f + i , 3.0f + i, 4.0f + i});

  for (int i = 0; i < NUM_THREADS;i++)
    {
      for (int j = 0; j < 4; j++)
	printf("%6.3f ", out[i][j]);
      printf("\n");
    }
  printf("\n");
}

Running the test like this: gcc -DNUM_THREADS=4 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t results in either broken results or crashes

While the single threaded version works fine: gcc -DNUM_THREADS=1 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t

The cpuSupportsExt function uses the file scope variable sigjmp to store the execution status what makes this function thread-unsafe.

I will send a PR to check HWCAPs instead of using the signal handler. This fixes the problem for S/390. I think other platforms might need similar adjustments.

Andreas-Krebbel avatar Jul 04 '24 07:07 Andreas-Krebbel

Sorry for the late reply and thank you very much for the detailed explanation of the issue, providing a reproducer as well as suggesting a fix and PR. Will look into your PR shortly!

blapie avatar Jul 09 '24 14:07 blapie

Re-opening the issue, as #560 only fixed the issue on a single architecture (s/390) and a single OS (linux). #563 attempts to fix other OS-es Yet to implement a thread-safe detection for other architecture.

blapie avatar Jul 26 '24 08:07 blapie