llvm icon indicating copy to clipboard operation
llvm copied to clipboard

Many AddressSanitizer fails on OCL CPU in Nightly

Open sarnex opened this issue 1 year ago • 8 comments

Describe the bug

https://github.com/intel/llvm/actions/runs/10952456651/job/30411378740

********************
Expectedly Failed Tests (1):
  SYCL :: AddressSanitizer/nullpointer/private_nullptr.cpp

********************
Failed Tests (37):
  SYCL :: AddressSanitizer/common/config-red-zone-size.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/invalid-argument/host-pointer.cpp
  SYCL :: AddressSanitizer/memory-leak/memory-leak.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/nullpointer/global_nullptr.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/large_group_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_double.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp

To reproduce

No response

Environment

No response

Additional context

No response

sarnex avatar Sep 20 '24 14:09 sarnex

@zhaomaosu @AllanZyne Can someone take a look at this? Thanks

sarnex avatar Sep 20 '24 14:09 sarnex

This is expected since the OCL CPU driver hasn't upgraded to the version we need. But the real issue is why this isn't detected on pre CI. This OCL requirement is introduced by https://github.com/intel/llvm/pull/14891, but its tests hadn't failed on CPU. Does nightly tests use a different OCL version?

AllanZyne avatar Sep 23 '24 02:09 AllanZyne

@AllanZyne I think it's because in precommit we run ocl cpu testing at the same time as ocl gpu and level zero on gen12, eg with -DSYCL_TEST_E2E_TARGETS="level_zero:gpu;opencl:gpu;opencl:cpu" and in the DeviceSantizier lit.local.cfg I see

# FIXME: Skip some of gpu devices, waiting for gfx driver uplifting
config.unsupported_features += ['gpu-intel-gen9', 'gpu-intel-gen11', 'gpu-intel-gen12', 'gpu-intel-pvc']

and since we were running gen12 gpu testing at the same time, I think the unsupported line above kicked in, resulting in all tests being marked unsupported in the precommit run.

Unsupported Tests (689):
...
SYCL :: AddressSanitizer/bad-free/bad-free-host.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-minus1.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-plus1.cpp
  SYCL :: AddressSanitizer/common/config-red-zone-size.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/double-free/double-free.cpp
  SYCL :: AddressSanitizer/invalid-argument/bad-context.cpp
  SYCL :: AddressSanitizer/invalid-argument/host-pointer.cpp
  SYCL :: AddressSanitizer/invalid-argument/out-of-bounds.cpp
  SYCL :: AddressSanitizer/invalid-argument/released-pointer.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/nullpointer/global_nullptr.cpp
  SYCL :: AddressSanitizer/nullpointer/private_nullptr.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/large_group_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_double.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-free.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp
...

But in the nightly, we run OCL CPU testing individually, so the tests actually ran.

I'm not sure if it's possible to skip GPU testing but keep CPU testing if we run them all at once, fyi @aelovikov-intel

sarnex avatar Sep 23 '24 14:09 sarnex

Thank you for your analysis, @sarnex. @aelovikov-intel, can you help to the tell me how to prevent from skipping CPU test?

AllanZyne avatar Sep 26 '24 03:09 AllanZyne

Don't use "static" features in your conditions (as opposite to device-specific ones, like arch-intel_gpu_pvc).

aelovikov-intel avatar Sep 26 '24 16:09 aelovikov-intel

BTW, can we specify the version of OCL CPU in LIT tests? This will be very helpful for this case as well.

AllanZyne avatar Oct 10 '24 03:10 AllanZyne

I don't think we support that, we only have that for GPU driver. FYI @aelovikov-intel

sarnex avatar Oct 10 '24 14:10 sarnex

@AllanZyne Do you know when we plan to update the OCLCPU driver to one where these tests work? Is it public yet?

sarnex avatar Oct 18 '24 19:10 sarnex

New OCL CPU driver releases with oneAPI 2025.0, I hear from others that it will release at the end of this month.

AllanZyne avatar Oct 21 '24 03:10 AllanZyne

Got it, thanks

sarnex avatar Oct 21 '24 13:10 sarnex

Hi, if I understand correctly, I want my tests run on cpu or gpu(dg2 only), I can't configure this in lit.local.cfg because it's tedious to write all unsupported devices, but I can write

// REQUIRES: linux, cpu || (gpu && level_zero && gpu-intel-dg2)

right?

and in the future, we'll enable pvc as well:

// REQUIRES: linux, cpu || (gpu && level_zero && (gpu-intel-dg2 || gpu-intel-pvc))

AllanZyne avatar Oct 25 '24 03:10 AllanZyne

I don't think it's possible to write unsupported_features using "arch-*" https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc. Because we can't predict which gpu devices will be added to CI.

AllanZyne avatar Oct 25 '24 03:10 AllanZyne

If that's what you really need, you can always improve lit.cfg.py/lit.local.cfg infrastructure to make that possible...

aelovikov-intel avatar Oct 25 '24 16:10 aelovikov-intel

Replied on the PR but I think what you want should be possible

sarnex avatar Oct 25 '24 17:10 sarnex

These are all passing now with the driver bump, closing.

sarnex avatar Nov 22 '24 15:11 sarnex

Hi @AllanZyne, should this XFAIL be removed or there is some unrelated issue?

KornevNikita avatar Dec 13 '24 12:12 KornevNikita

Hi @AllanZyne, should this XFAIL be removed or there is some unrelated issue?

I remembered it would fail on post ci. Let me reproduce it locally first.

AllanZyne avatar Dec 16 '24 03:12 AllanZyne