ze_tracer/onetrace: Assertion `call->command != command' failed with simple SYCL Graph application
Trying to trace a simple application which uses SYCL Graphs with ze_tracer or onetrace triggers an internal failed assertion.
$ clang++ -fsycl -g test_graph_zetrace.cpp -o test_graph_zetrace
$ ONEAPI_DEVICE_SELECTOR=level_zero:0 ~/pti-gpu/tools/ze_tracer/build/ze_tracer ./test_graph_zetrace
Intel(R) Arc(TM) A770 Graphics : native
Done!
test_graph_zetrace: /home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h:1041: void ZeKernelCollector::RemoveKernelCommands(ze_command_list_handle_t): Assertion `call->command != command' failed.
Aborted (core dumped)
Ubuntu Linux 22.04 (6.2.0-36-generic), Intel Compute Runtime 23.30.26918.9, recent Intel LLVM built from source (a2f02214200ef71d3a8ec6cae1b84a16508513c4), PTI-GPU 90b9230c8bd9c00211934ec5e36183edc3aa8c1d.
Source code:
#include <sycl/sycl.hpp>
namespace syclex = sycl::ext::oneapi::experimental;
int main() {
for (const auto &dev : sycl::device::get_devices()) {
using graph_support = syclex::info::device::graph_support;
using gsl = syclex::graph_support_level;
const auto gs = dev.get_info<graph_support>();
std::cout << dev.get_info<sycl::info::device::name>() << " : "
<< (gs == gsl::unsupported
? "unsupported"
: (gs == gsl::emulated ? "emulated" : "native"))
<< std::endl;
if (gs != gsl::unsupported) {
sycl::context ctx{dev};
sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
std::vector<sycl::queue> queuesToRecord{q1};
const sycl::property_list propList{syclex::property::graph::no_cycle_check()};
syclex::command_graph<syclex::graph_state::modifiable> graph(ctx, dev, propList);
int *value_h = sycl::malloc_host<int>(1, ctx);
int *value_i = sycl::malloc_device<int>(1, dev, ctx);
int *value_o = sycl::malloc_device<int>(1, dev, ctx);
value_h[0] = 1;
q1.memcpy(value_i, value_h, 1 * sizeof(int)).wait_and_throw();
bool result = graph.begin_recording(queuesToRecord);
if (!result) {
std::cout << " Could not start the recording" << std::endl;
}
q1.submit([&](sycl::handler &cgh) {
cgh.single_task<class Memset>([=]() { value_o[0] = 0; });
});
q1.submit([&](sycl::handler &cgh) {
cgh.single_task<class Memcpy>([=]() { value_i[0] = value_o[0]; });
});
graph.end_recording();
auto instance = graph.finalize();
q1.ext_oneapi_graph(instance).wait_and_throw();
std::cout << " Done!" << std::endl;
q1.wait_and_throw();
} // Here it dies when destroying `instance`
}
std::cout << "Done!" << std::endl;
return 0;
}
Stack trace:
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352309824) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737352309824) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737352309824, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff2642476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff26287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff262871b in __assert_fail_base (fmt=0x7ffff27dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff7fababd "call->command != command", file=0x7ffff7f9dc20 "/home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h", line=1041, function=<optimized out>) at ./assert/assert.c:92
#6 0x00007ffff2639e96 in __GI___assert_fail (assertion=0x7ffff7fababd "call->command != command", file=0x7ffff7f9dc20 "/home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h", line=1041, function=0x7ffff7faba38 "void ZeKernelCollector::RemoveKernelCommands(ze_command_list_handle_t)") at ./assert/assert.c:101
#7 0x00007ffff7f9c2df in ZeKernelCollector::OnExitCommandListDestroy(_ze_command_list_destroy_params_t*, _ze_result_t, void*, void**) () from /home/aland/pti-gpu/tools/ze_tracer/build/libzet_tracer.so
#8 0x00007ffff7ac8dc5 in tracing_layer::zeCommandListDestroy(_ze_command_list_handle_t*) () from /home/aland/intel-sycl/llvm/build/install//lib/libze_tracing_layer.so.1
#9 0x00007ffff00229e2 in ur_exp_command_buffer_handle_t_::~ur_exp_command_buffer_handle_t_() () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#10 0x00007ffff0023302 in urCommandBufferReleaseExp () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#11 0x00007ffff008805d in piextCommandBufferRelease () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#12 0x00007ffff2f2ac87 in sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl::~exec_graph_impl() () from /home/aland/intel-sycl/llvm/build/install//lib/libsycl.so.7
#13 0x0000000000406b8e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x18ca3b0) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:346
#14 0x0000000000406b0a in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffd4a0) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1071
#15 0x0000000000407729 in std::__shared_ptr<sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffd498) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1524
#16 0x0000000000407705 in std::shared_ptr<sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl>::~shared_ptr (this=0x7fffffffd498) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:175
#17 0x00000000004076e5 in sycl::_V1::ext::oneapi::experimental::detail::executable_command_graph::~executable_command_graph (this=0x7fffffffd498) at /home/aland/intel-sycl/llvm/build/install/bin/../include/sycl/ext/oneapi/experimental/graph.hpp:289
#18 0x0000000000406755 in sycl::_V1::ext::oneapi::experimental::command_graph<(sycl::_V1::ext::oneapi::experimental::graph_state)1>::~command_graph (this=0x7fffffffd498) at /home/aland/intel-sycl/llvm/build/install/bin/../include/sycl/ext/oneapi/experimental/graph.hpp:336
#19 0x0000000000403fe1 in main () at test_graph_zetrace.cpp:49
Output with SYCL_PI_TRACE=-1:
sycl_pi_trace.log
@al42and thank you for reporting! we will look into it in few days. Any chances to check it with recent oneAPI release?
@jfedorov: I was using open-source IntelLLVM above. But the problem can be reproduced with oneAPI 2024.0 (Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)).
Note: when compiling with icpx, one would have to change line 8 to using gsl = syclex::info::graph_support_level;, because the SYCL Graph API is unstable.
@al42and Thank you. will look into it.
@al42and This issue is indeed reproduced (~three weeks ago) and hopefully will be fixed soon. thank you.
As of ze_tracer 1b7929b8139b09b03127c92211a9be2be9fb900e, the issue still reproduces with ICPX 2024.1.2, but not with the open-source IntelLLVM 2838f40382bedddbda0a5f20ebeeba86310044da. So, looks like some nasty behavior in the IntelLLVM/UR?
Works fine with ICPX 2024.2.0 and latest open-source IntelLLVM build. :+1: