llvm icon indicating copy to clipboard operation
llvm copied to clipboard

LevelZero UR Adapter Release Fails

Open sommerlukas opened this issue 7 months ago • 6 comments

Describe the bug

We have started to observe flaky behavior in a workflow on the CUTLASS SYCL repository.

The workflow in question uses nightly builds of DPC++, the first failure was observed on 2025-05-14 with the nightly release from the same date.

The workflow tests the CUTLASS Python interface. In the SYCL implementation, the DPCTL framework is used to interface with the SYCL runtime from Python code.

To aid investigation, we have added SYCL_UR_TRACE=2 to the workflow. Due to that, on the latest failure, we could trace the observed error back to a failed release of the LevelZero UR adpter:

terminate called after throwing an instance of 'sycl::_V1::exception'
   ---> urAdapterRelease
  what():  Native API failed. Native API returns: 37 (UR_RESULT_ERROR_UNINITIALIZED)
   <--- urAdapterRelease(.hAdapter = 0x560130490760) -> UR_RESULT_ERROR_UNINITIALIZED;
/home/runner/actions-runner/_work/_temp/e6ff9a6b-8d76-49d8-8aee-ab852c8279b9.sh: line 5: 68971 Aborted

Full log is available here.

To reproduce

See the full log above or contact issue reporter.

Environment

  • OS: Linux
  • Target device and vendor: Intel Data Center GPU Max 1100
  • DPC++ version: Nightly release 2025-05-14 (first version with which the behavior was observed).

Additional context

No response

sommerlukas avatar May 22 '25 11:05 sommerlukas

@kbenzie mentioned that the UMF tag bump in https://github.com/intel/llvm/pull/18378 could help.

sommerlukas avatar May 22 '25 12:05 sommerlukas

Which CPU are you using? @sommerlukas

deadpipe avatar May 25 '25 12:05 deadpipe

@kbenzie mentioned that the UMF tag bump in #18378 could help.

This has just merged so should be part of the next nightly build.

kbenzie avatar May 26 '25 09:05 kbenzie

#18378 has not resolve this issue.

kbenzie avatar May 29 '25 15:05 kbenzie

I think this has the hallmarks of attempting to release an already released adapter, hence UR_RESULT_ERROR_UNINITIALIZED coming from the loader since adapter function pointer being called is nullptr.

kbenzie avatar Jun 02 '25 12:06 kbenzie

This is potentioally related to URT-931 and/or URT-939.

kbenzie avatar Jun 17 '25 15:06 kbenzie