unified-memory-framework icon indicating copy to clipboard operation
unified-memory-framework copied to clipboard

Sporadic fail in Level Zero IPC tests

Open bratpiorka opened this issue 11 months ago • 9 comments

Sporadic fail in Level Zero IPC tests

umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

Environment Information

  • UMF version (hash commit or a tag): latest
  • OS(es) version(s): Ubuntu Release

Please provide a reproduction of the bug:

https://github.com/oneapi-src/unified-memory-framework/actions/runs/13719665034/job/38374211866?pr=1147

35/55 Test #35: test_provider_level_zero ......................***Exception: SegFault  0.17 sec
Running main() from /home/test-user/actions-runner/_work/unified-memory-framework/unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 46 tests from 4 test suites.
[----------] Global test environment set-up.

...

[----------] 14 tests from umfLevelZeroProviderTestSuite/umfIpcTest
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0 (4 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0 (12 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

CRASH

How often bug is revealed:

rare

bratpiorka avatar Mar 07 '25 12:03 bratpiorka

recent fail: https://github.com/oneapi-src/unified-memory-framework/actions/runs/14079051874/job/39429048160?pr=1223

bratpiorka avatar Mar 26 '25 09:03 bratpiorka

@vinser52

ldorau avatar Mar 27 '25 08:03 ldorau

Maybe this ASAN error has something to do with it? https://github.com/ldorau/unified-memory-framework/actions/runs/14083824988/job/39443119450

==12506==ERROR: AddressSanitizer: use-after-poison on address 0x7fa3f2a69188 at pc 0x55fdfc47fde2 bp 0x7fa3e93feb30 sp 0x7fa3e93feb20
READ of size 8 at 0x7fa3f2a69188 thread T17
    #0 0x55fdfc47fde1 in utils_atomic_load_acquire_u64 /home/runner/work/unified-memory-framework/unified-memory-framework/src/utils/utils_concurrency.h:165
    #1 0x55fdfc4814e6 in umfMemoryTrackerAdd /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:202
    #2 0x55fdfc48407a in trackingAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:481
    #3 0x55fdfc47ccbe in umfMemoryProviderAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_provider.c:245
    #4 0x55fdfc49f34a in proxy_aligned_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:51
    #5 0x55fdfc49f470 in proxy_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:64
    #6 0x55fdfc47a010 in umfPoolMalloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_pool.c:189

ldorau avatar Mar 27 '25 08:03 ldorau

See: https://github.com/oneapi-src/unified-memory-framework/pull/1224

ldorau avatar Mar 27 '25 12:03 ldorau

Maybe this ASAN error has something to do with it? https://github.com/ldorau/unified-memory-framework/actions/runs/14083824988/job/39443119450

==12506==ERROR: AddressSanitizer: use-after-poison on address 0x7fa3f2a69188 at pc 0x55fdfc47fde2 bp 0x7fa3e93feb30 sp 0x7fa3e93feb20
READ of size 8 at 0x7fa3f2a69188 thread T17
    #0 0x55fdfc47fde1 in utils_atomic_load_acquire_u64 /home/runner/work/unified-memory-framework/unified-memory-framework/src/utils/utils_concurrency.h:165
    #1 0x55fdfc4814e6 in umfMemoryTrackerAdd /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:202
    #2 0x55fdfc48407a in trackingAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:481
    #3 0x55fdfc47ccbe in umfMemoryProviderAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_provider.c:245
    #4 0x55fdfc49f34a in proxy_aligned_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:51
    #5 0x55fdfc49f470 in proxy_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:64
    #6 0x55fdfc47a010 in umfPoolMalloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_pool.c:189

I am not sure because the issue above is related to the allocation flow, right?

vinser52 avatar Mar 27 '25 13:03 vinser52

I am not sure because the issue above is related to the allocation flow, right?

Most probably, or the free() path.

ldorau avatar Mar 27 '25 14:03 ldorau

latest occurence: https://github.com/oneapi-src/unified-memory-framework/actions/runs/14998378565/job/42144669214

lukaszstolarczuk avatar May 14 '25 11:05 lukaszstolarczuk

The latest occurrence: https://github.com/oneapi-src/unified-memory-framework/actions/runs/16470636284/job/46558694579 and https://github.com/oneapi-src/unified-memory-framework/actions/runs/16470636284/job/46558694507

ldorau avatar Jul 23 '25 13:07 ldorau

The latest occurrence: https://github.com/oneapi-src/unified-memory-framework/actions/runs/16617013654/job/47013068561

ldorau avatar Jul 30 '25 08:07 ldorau