oneDPL
oneDPL copied to clipboard
Using oneDPL forces client to always run with "SYCL device" present
I seem to have run into a pretty fundamental design issue in the code. :frowning: As discussed in https://github.com/acts-project/traccc/pull/442, apparently a oneDPL using application can never be executed on a host that has no "SYCL capable devices" at all.
In our applications we've seen a couple of times already that we must not create sycl::queue
objects in a global scope. We use gtest_discover_tests(...) to set up tests in our projects to CTest. But for this to work in a CI, which generally doesn't provide us with "SYCL capable devices", the test code must only try to create sycl::queue
-s once a test is actually running. (In practice this just meant not ever declaring "global" sycl::queue
objects.)
Unfortunately as soon as we include anything from oneDPL, our tests fail to even build in our CI system. :frowning: Because oneDPL itself apparently creates at least one (but as far as I can see actually multiple) sycl::queue
object globally just by being included.
Thread 1 "traccc_test_syc" hit Catchpoint 1 (exception thrown), 0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0 0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff1a74613 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)>, std::vector<sycl::_V1::device, std::allocator<sycl::_V1::device> >&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#2 0x00007ffff1a750b9 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)> const&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#3 0x00007ffff1a757fc in sycl::_V1::device_selector::select_device() const () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#4 0x0000000000472605 in sycl::_V1::queue::queue(sycl::_V1::device_selector const&, std::function<void (sycl::_V1::exception_list)> const&, sycl::_V1::property_list const&) (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, DeviceSelector=..., AsyncHandler=..., PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:186
#5 0x0000000000472305 in sycl::_V1::queue::queue (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:95
#6 0x000000000047113e in oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::device_policy (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>) at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:48
#7 0x000000000046d092 in __cxx_global_var_init () at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:121
#8 0x00000000005354ad in __libc_csu_init ()
#9 0x00007ffff15ff010 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x000000000046e27e in _start ()
(gdb)
I consider this a serious bug in the design. :thinking: Since it means that an application that uses oneDPL somewhere deep inside itself, possibly behind some checks that make sure that a "SYCL device" would actually be available before oneDPL functionality would be used, would fail to even start on a host with no "SYCL device" present. :frowning:
So those default policies would need to dynamically create their SYCL objects on first use as the simplest fix. But possibly some deeper re-thinking could be done on how that part of the code works...
Pinging @ivorobts.