llvm icon indicating copy to clipboard operation
llvm copied to clipboard

Segfault when submitting empty barriers

Open al42and opened this issue 1 year ago • 1 comments

Describe the bug

Submitting empty barriers to in-order queue from multiple non-interacting threads causes segfault. Reproduced with L0, OpenCL, HIP.

Reverting 7e08c15dddfdb938767a73d6d9c0c31a8149c9b3 (#12045) solves the issue.

To Reproduce

$ clang++ -fsycl -fsycl-targets=amd_gpu_gfx90a -lpthread -O2 barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=hip:0 ./a.out
gfx90a:sramecc-:xnack-
gfx90a:sramecc-:xnack-
Segmentation fault
$ clang++ -fsycl -lpthread -O1 -g barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=level_zero:0 ./a.out
Intel(R) Arc(TM) A770 Graphics
Intel(R) Arc(TM) A770 Graphics
Segmentation fault (core dumped)

$ clang++ -fsycl -lpthread -O1 -g barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=opencl:1 ./a.out
Intel(R) UHD Graphics 770Intel(R) UHD Graphics 770

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -57 (PI_ERROR_INVALID_EVENT_WAIT_LIST) -57 (PI_ERROR_INVALID_EVENT_WAIT_LIST)
Aborted (core dumped)
#include <iostream>
#include <mutex>
#include <sycl/sycl.hpp>
#include <thread>
#include <unistd.h>

static constexpr int niter = 1024;
static constexpr int nthreads = 2;

std::array<std::mutex, nthreads> mutexes;
std::array<std::optional<sycl::event>, nthreads> events;

void threadFunction(int tid) {
  sycl::device dev;
  std::cout << dev.get_info<sycl::info::device::name>() << std::endl;
  sycl::context ctx{dev};
  sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
  sycl::queue q2{ctx, dev, {sycl::property::queue::in_order()}};
  for (int i = 0; i < niter; i++) {
    sycl::event ev1 = q1.ext_oneapi_submit_barrier();
    q2.ext_oneapi_submit_barrier({ev1});
    sycl::event ev2 = q2.ext_oneapi_submit_barrier();
    q1.ext_oneapi_submit_barrier({ev2});
  }
}

int main() {
  std::array<std::thread, nthreads> threads;

  for (int i = 0; i < nthreads; i++) {
    threads[i] = std::thread{threadFunction, i};
  }

  for (int i = 0; i < nthreads; i++) {
    threads[i].join();
  }
  std::cout << "All threads have finished." << std::endl;

  return 0;
}

Environment:

  • OS: Linux
  • Target device and vendor: AMD MI250X GPU, Intel A770 GPU
  • DPC++ version: bd4a460806d8a4e4d25b53e72a3fa627aa20ad0f
  • Dependencies version: ROCm 5.3.3; compute-runtime 23.59

al42and avatar Dec 14 '23 16:12 al42and

The problem is still present as of 54a67eb2c1cf275cef4d12b56b0b0786db26cbab

al42and avatar Feb 19 '24 11:02 al42and

Fixed in https://github.com/intel/llvm/pull/12951

againull avatar Mar 08 '24 21:03 againull