llvm
llvm copied to clipboard
[E2E] Basic/event_profiling_info.cpp seems flaky
Describe the bug
Failed run: https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588 Successful run: https://github.com/intel/llvm/actions/runs/8886566095/job/24406513670
I observed this behavior L0 GPU on Windows, but now sure if we could also reproduce this flaky behavior on other Linux or devices.
FAIL: SYCL :: Basic/event_profiling_info.cpp (220 of 2017)
******************** TEST 'SYCL :: Basic/event_profiling_info.cpp' FAILED ********************
Exit Code: 3221226505
Command Output (stdout):
--
# RUN: at line 2
D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir64 D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp -o D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir[64](https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588#step:12:65) 'D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp' -o 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# RUN: at line 4
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# .---command stderr------------
# | Assertion failed: Submit <= Start, file D:/github/actions-runner/_work/llvm/llvm/llvm/sycl/test-e2e/Basic/event_profiling_info.cpp, line 30
# `-----------------------------
# error: command failed with exit status: 0xc0000409
To reproduce
DPC++ commit: c2cc3a1327f668795881a7b157388ad516bdd472
Environment
OS: Windows Device: L0 Gen12
sycl-ls --verbose
Platform [#2]:
Version : 1.3
Name : Intel(R) Level-Zero
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 1.3
Name : Intel(R) Iris(R) Xe Graphics
Vendor : Intel(R) Corporation
Driver : 1.3.28044
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca
info::device::sub_group_sizes: 8 16 32
Additional context
No response
Tag @againull for awareness. Could this be due to the known timing approximation issues?
I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.
On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).
I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.
On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).
Yes, I observed a similar flaky failure in Basic/submit_time.cpp: https://github.com/intel/llvm/actions/runs/9406901188/job/25911860208?pr=14002