level-zero icon indicating copy to clipboard operation
level-zero copied to clipboard

Deadlock event test failure

Open bl4ckb0ne opened this issue 1 year ago • 2 comments

Hi,

I'm packaging level-zero for AlpineLinux 1 and with the latest release both tests_event_deadlock and tests_event_deadlock_reset are failing on all arches.

Internal ctest changing into directory: /builds/alpine/aports/testing/level-zero/src/level-zero-1.19.2/build
Test project /builds/alpine/aports/testing/level-zero/src/level-zero-1.19.2/build
      Start  1: tests_api
      Start  2: tests_init_gpu_all
      Start  3: tests_init_npu_all
      Start  4: tests_any
      Start  5: tests_both_init_all
      Start  6: tests_both_init_gpu
      Start  7: tests_both_init_npu
      Start  8: tests_both_succeed
      Start  9: tests_both_gpu
      Start 10: tests_both_npu
      Start 11: tests_missing_api
      Start 12: tests_multi_call_failure
      Start 13: tests_event_deadlock
      Start 14: tests_event_deadlock_reset
      Start 15: tests_event_reset_reuse
 1/15 Test  #1: tests_api ........................   Passed    0.03 sec
 2/15 Test  #2: tests_init_gpu_all ...............   Passed    0.03 sec
 3/15 Test  #3: tests_init_npu_all ...............   Passed    0.03 sec
 4/15 Test  #4: tests_any ........................   Passed    0.02 sec
 5/15 Test  #5: tests_both_init_all ..............   Passed    0.02 sec
 6/15 Test  #6: tests_both_init_gpu ..............   Passed    0.02 sec
 7/15 Test  #7: tests_both_init_npu ..............   Passed    0.02 sec
 8/15 Test  #8: tests_both_succeed ...............   Passed    0.02 sec
 9/15 Test  #9: tests_both_gpu ...................   Passed    0.01 sec
10/15 Test #10: tests_both_npu ...................   Passed    0.01 sec
11/15 Test #11: tests_missing_api ................   Passed    0.01 sec
12/15 Test #12: tests_multi_call_failure .........   Passed    0.01 sec
13/15 Test #13: tests_event_deadlock .............Subprocess aborted***Exception:   0.01 sec
Running main() from /builds/alpine/aports/testing/level-zero/src/level-zero-1.19.2/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from LoaderValidation
[ RUN      ] LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock
/usr/include/c++/14.2.0/bits/stl_vector.h:1130: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = int; _Alloc = std::allocator<int>; reference = int&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
14/15 Test #14: tests_event_deadlock_reset .......Subprocess aborted***Exception:   0.01 sec
Running main() from /builds/alpine/aports/testing/level-zero/src/level-zero-1.19.2/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsAndExplicitCallzeEventHostSignalThenValidationLayerPrintsWarningOfIllegalUsage*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from LoaderValidation
[ RUN      ] LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsAndExplicitCallzeEventHostSignalThenValidationLayerPrintsWarningOfIllegalUsage
/usr/include/c++/14.2.0/bits/stl_vector.h:1130: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = int; _Alloc = std::allocator<int>; reference = int&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
15/15 Test #15: tests_event_reset_reuse ..........   Passed    0.01 sec
87% tests passed, 2 tests failed out of 15
Total Test time (real) =   0.04 sec
The following tests FAILED:
	 13 - tests_event_deadlock (Subprocess aborted)
	 14 - tests_event_deadlock_reset (Subprocess aborted)
Errors while running CTest

bl4ckb0ne avatar Dec 16 '24 16:12 bl4ckb0ne

Hi @bl4ckb0ne

In our environment we have not encountered this error. Can you please provide details of the environment where you have encountered the error? Perhaps a docker image for us to reproduce. What GCC are you using, is it 14.2?

How urgent is this for your release? The event_deadlock is an optional feature to validate the usage of event at runtime to find if there is a circular dependency introduced by improper usage of L0 API. In that sense it is not part of the L0 API. We can disable the event_deadlock and you will be able to proceed with your build. In the meantime, we can fix the error for your environment so that you can merge it in your next release. Please let us know.

bibrak avatar Dec 16 '24 22:12 bibrak

You can find the official Alpinelinux dockerfiles here, edge is the closest to the setup. We're indeed using GCC 14.2

No hurry on my side, there's no release planed anytime soon, I'm just packaging the release for the distribution. I disabled both tests in the package for the time being.

bl4ckb0ne avatar Dec 17 '24 00:12 bl4ckb0ne