llvm
llvm copied to clipboard
Segfault when submitting empty barriers
Describe the bug
Submitting empty barriers to in-order queue from multiple non-interacting threads causes segfault. Reproduced with L0, OpenCL, HIP.
Reverting 7e08c15dddfdb938767a73d6d9c0c31a8149c9b3 (#12045) solves the issue.
To Reproduce
$ clang++ -fsycl -fsycl-targets=amd_gpu_gfx90a -lpthread -O2 barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=hip:0 ./a.out
gfx90a:sramecc-:xnack-
gfx90a:sramecc-:xnack-
Segmentation fault
$ clang++ -fsycl -lpthread -O1 -g barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=level_zero:0 ./a.out
Intel(R) Arc(TM) A770 Graphics
Intel(R) Arc(TM) A770 Graphics
Segmentation fault (core dumped)
$ clang++ -fsycl -lpthread -O1 -g barrier-crash.cpp && ONEAPI_DEVICE_SELECTOR=opencl:1 ./a.out
Intel(R) UHD Graphics 770Intel(R) UHD Graphics 770
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -57 (PI_ERROR_INVALID_EVENT_WAIT_LIST) -57 (PI_ERROR_INVALID_EVENT_WAIT_LIST)
Aborted (core dumped)
#include <iostream>
#include <mutex>
#include <sycl/sycl.hpp>
#include <thread>
#include <unistd.h>
static constexpr int niter = 1024;
static constexpr int nthreads = 2;
std::array<std::mutex, nthreads> mutexes;
std::array<std::optional<sycl::event>, nthreads> events;
void threadFunction(int tid) {
sycl::device dev;
std::cout << dev.get_info<sycl::info::device::name>() << std::endl;
sycl::context ctx{dev};
sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
sycl::queue q2{ctx, dev, {sycl::property::queue::in_order()}};
for (int i = 0; i < niter; i++) {
sycl::event ev1 = q1.ext_oneapi_submit_barrier();
q2.ext_oneapi_submit_barrier({ev1});
sycl::event ev2 = q2.ext_oneapi_submit_barrier();
q1.ext_oneapi_submit_barrier({ev2});
}
}
int main() {
std::array<std::thread, nthreads> threads;
for (int i = 0; i < nthreads; i++) {
threads[i] = std::thread{threadFunction, i};
}
for (int i = 0; i < nthreads; i++) {
threads[i].join();
}
std::cout << "All threads have finished." << std::endl;
return 0;
}
Environment:
- OS: Linux
- Target device and vendor: AMD MI250X GPU, Intel A770 GPU
- DPC++ version: bd4a460806d8a4e4d25b53e72a3fa627aa20ad0f
- Dependencies version: ROCm 5.3.3; compute-runtime 23.59
The problem is still present as of 54a67eb2c1cf275cef4d12b56b0b0786db26cbab
Fixed in https://github.com/intel/llvm/pull/12951