oneDPL icon indicating copy to clipboard operation
oneDPL copied to clipboard

Using oneDPL forces client to always run with "SYCL device" present

Open krasznaa opened this issue 1 year ago • 6 comments

I seem to have run into a pretty fundamental design issue in the code. :frowning: As discussed in https://github.com/acts-project/traccc/pull/442, apparently a oneDPL using application can never be executed on a host that has no "SYCL capable devices" at all.

In our applications we've seen a couple of times already that we must not create sycl::queue objects in a global scope. We use gtest_discover_tests(...) to set up tests in our projects to CTest. But for this to work in a CI, which generally doesn't provide us with "SYCL capable devices", the test code must only try to create sycl::queue-s once a test is actually running. (In practice this just meant not ever declaring "global" sycl::queue objects.)

Unfortunately as soon as we include anything from oneDPL, our tests fail to even build in our CI system. :frowning: Because oneDPL itself apparently creates at least one (but as far as I can see actually multiple) sycl::queue object globally just by being included.

Thread 1 "traccc_test_syc" hit Catchpoint 1 (exception thrown), 0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff1a74613 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)>, std::vector<sycl::_V1::device, std::allocator<sycl::_V1::device> >&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#2  0x00007ffff1a750b9 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)> const&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#3  0x00007ffff1a757fc in sycl::_V1::device_selector::select_device() const () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#4  0x0000000000472605 in sycl::_V1::queue::queue(sycl::_V1::device_selector const&, std::function<void (sycl::_V1::exception_list)> const&, sycl::_V1::property_list const&) (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, DeviceSelector=..., AsyncHandler=..., PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:186
#5  0x0000000000472305 in sycl::_V1::queue::queue (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:95
#6  0x000000000047113e in oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::device_policy (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>) at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:48
#7  0x000000000046d092 in __cxx_global_var_init () at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:121
#8  0x00000000005354ad in __libc_csu_init ()
#9  0x00007ffff15ff010 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x000000000046e27e in _start ()
(gdb)

I consider this a serious bug in the design. :thinking: Since it means that an application that uses oneDPL somewhere deep inside itself, possibly behind some checks that make sure that a "SYCL device" would actually be available before oneDPL functionality would be used, would fail to even start on a host with no "SYCL device" present. :frowning:

So those default policies would need to dynamically create their SYCL objects on first use as the simplest fix. But possibly some deeper re-thinking could be done on how that part of the code works...

Pinging @ivorobts.

krasznaa avatar Aug 17 '23 09:08 krasznaa