llvm icon indicating copy to clipboard operation
llvm copied to clipboard

syclcompat/memory/memory_async.cpp failed in post-commit

Open KornevNikita opened this issue 1 year ago • 8 comments

Describe the bug

see: https://github.com/intel/llvm/actions/runs/8921355161/job/24501752365?pr=13601

FAIL: SYCL :: syclcompat/memory/memory_async.cpp (2007 of 2020)
******************** TEST 'SYCL :: syclcompat/memory/memory_async.cpp' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 33
/__w/llvm/llvm/toolchain/bin//clang++   -std=c++20 -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/syclcompat/memory/memory_async.cpp -o /__w/llvm/llvm/build-e2e/syclcompat/memory/Output/memory_async.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -std=c++20 -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/syclcompat/memory/memory_async.cpp -o /__w/llvm/llvm/build-e2e/syclcompat/memory/Output/memory_async.cpp.tmp.out
# .---command stderr------------
# | In file included from /__w/llvm/llvm/llvm/sycl/test-e2e/syclcompat/memory/memory_async.cpp:42:
# | In file included from /__w/llvm/llvm/toolchain/bin/../include/syclcompat/memory.hpp:53:
# | /__w/llvm/llvm/toolchain/bin/../include/syclcompat/device.hpp:348:2: warning: "Querying the number of bytes of free memory is not supported" [-W#warnings]
# |   348 | #warning "Querying the number of bytes of free memory is not supported"
# |       |  ^
# | 1 warning generated.
# | In file included from /__w/llvm/llvm/llvm/sycl/test-e2e/syclcompat/memory/memory_async.cpp:42:
# | In file included from /__w/llvm/llvm/toolchain/bin/../include/syclcompat/memory.hpp:53:
# | /__w/llvm/llvm/toolchain/bin/../include/syclcompat/device.hpp:348:2: warning: "Querying the number of bytes of free memory is not supported" [-W#warnings]
# |   348 | #warning "Querying the number of bytes of free memory is not supported"
# |       |  ^
# | /__w/llvm/llvm/toolchain/bin/../include/syclcompat/device.hpp:406:2: warning: "get_device_info: querying memory_clock_rate and memory_bus_width are not supported by the compiler used. Use [32](https://github.com/intel/llvm/actions/runs/8921355161/job/24501752365?pr=13601#step:21:33)00000 kHz as memory_clock_rate default value. Use 64 bits as memory_bus_width default value." [-W#warnings]
# |   406 | #warning "get_device_info: querying memory_clock_rate and \
# |       |  ^
# | 2 warnings generated.
# `-----------------------------
# RUN: at line [34](https://github.com/intel/llvm/actions/runs/8921355161/job/24501752365?pr=13601#step:21:35)
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/syclcompat/memory/Output/memory_async.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/syclcompat/memory/Output/memory_async.cpp.tmp.out
# .---command stdout------------
# | void test_free_async()
# | void test_memcpy_async1()
# | void test_memcpy_async2()
# | void test_memcpy_async3()
# | void test_memset_async1()
# `-----------------------------
# error: command failed with exit status: -11

To reproduce

No response

Environment

No response

Additional context

No response

KornevNikita avatar May 02 '24 10:05 KornevNikita

I had a flaky test a day ago for level_zero as well, but it was a different test. So far I haven't been able to reproduce the issue. We will have a deeper look into this.

@joeatodd ping for visibility.

Alcpz avatar May 02 '24 14:05 Alcpz

We've been investigating this. None of these tests have been touched recently, so I expect we are exposing an issue in UR or L0. I've been unable to reproduce the failure locally on our Arc A770.

It also seems like the failure has stopped occurring :crossed_fingers: though given it's intermittent we can't yet be sure.

For now I'd suggest we wait and see if it continues to occur. If so we will need to look deeper.

joeatodd avatar May 03 '24 10:05 joeatodd

I have been intermittently checking recent SYCL Post Commit actions and this failure hasn't re-occurred for a couple of weeks now. We believe we were probably exposing an underlying L0 bug which has likely since been resolved. Closing this for now.

joeatodd avatar May 13 '24 08:05 joeatodd

@joeatodd This is failing again https://github.com/intel/llvm/actions/runs/10150284793/job/28067414470 Can someone take a look and/or disable the test? Thanks

sarnex avatar Jul 29 '24 19:07 sarnex

@intel/syclcompat-lib-reviewers FYI

sarnex avatar Jul 29 '24 19:07 sarnex

Sorry for delay @sarnex, here's a PR which disables these tests for now, until we can get to the root cause.

joeatodd avatar Jul 31 '24 10:07 joeatodd

Tests disabled for now. (#14855 merged)

Alcpz avatar Jul 31 '24 13:07 Alcpz

Re-enabled in this PR (awaiting review)

joeatodd avatar Oct 16 '24 09:10 joeatodd

PR merged, tests passing.

joeatodd avatar Oct 23 '24 08:10 joeatodd