llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[CI] Run E2E tests on PVC in Linux pre-commit

Open uditagarwal97 opened this issue 1 year ago • 6 comments

We got new PVC machines for intel/llvm CI testing 😍. This PR enables running E2E tests on PVC runners in Linux pre-commit.

Example: https://github.com/intel/llvm/actions/runs/10061378191/job/27811302273

uditagarwal97 avatar Jul 23 '24 15:07 uditagarwal97

UR CI has been running SYCL e2e tests with PVC for some time now, and we had to resort to disabling/xfailing tests: https://github.com/oneapi-src/unified-runtime/blob/main/.github/workflows/e2e_level_zero.yml#L24

So I'm looking forward to not having to do that anymore :)

pbalcer avatar Jul 23 '24 15:07 pbalcer

@kbenzie I see a lot of AddressSanitizer/* E2E tests fail on PVC with what(): Native API failed. Native API returns: -995 (The plugin or device does not support the called function) error. https://github.com/intel/llvm/actions/runs/10081030441/job/27872830672?pr=14720#step:22:5203

SYCL :: AddressSanitizer/bad-free/bad-free-host.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-minus1.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-plus1.cpp
  SYCL :: AddressSanitizer/common/config-red-zone-size.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/double-free/double-free.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_double.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-free.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp

Are these failures expected on PVC?

uditagarwal97 avatar Jul 24 '24 18:07 uditagarwal97

@AllanZyne I think this may have started failing after this was merged: https://github.com/intel/llvm/pull/13450 Can you take a look?

@uditagarwal97 we see the same issue in our CI, and it started yesterday. No it's not expected. Except for a few matrix tests, and one plugin test (see workflow file) everything should be passing.

pbalcer avatar Jul 25 '24 06:07 pbalcer

what(): Native API failed. Native API returns: -995 (The plugin or device does not support the called function) is caused by run on "opencl:gpu" device, which we don't have plan to support.

"level_zero:gpu" is not working either, because just like gen12 etc., we need to wait for gfx-driver upgrading.

Can you help to modify llvm/sycl/test-e2e/AddressSanitizer/lit.local.cfg to disable those ASan tests on PVC?

config.unsupported_features += ['gpu-intel-gen9', 'gpu-intel-gen11', 'gpu-intel-gen12', 'gpu-intel-pvc']

Thank you!

AllanZyne avatar Jul 25 '24 08:07 AllanZyne

GH Issue to track disabled tests: https://github.com/intel/llvm/issues/14826

uditagarwal97 avatar Jul 29 '24 14:07 uditagarwal97

@intel/unified-runtime-reviewers ping!

uditagarwal97 avatar Aug 07 '24 20:08 uditagarwal97

@intel/llvm-gatekeepers the PR is ready to be merged. The following test failures in Arc are unrelated. I also see these XPASS in post-commit (https://github.com/intel/llvm/actions/runs/10692983484/job/29642595868)

********************
Unexpectedly Passed Tests (14):
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_abc.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_1d.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_1d_cont.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_half.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_int8.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_int8_packed.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_ops_scalar.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_all_sizes.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/element_wise_ops.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/get_coord_float_matC.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/get_coord_int8_matA.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/get_coord_int8_matB.cpp
  SYCL :: Matrix/SPVCooperativeMatrix/joint_matrix_apply_bf16.cpp

I've opened a new issue for these: https://github.com/intel/llvm/issues/15278

uditagarwal97 avatar Sep 04 '24 06:09 uditagarwal97

@uditagarwal97 I see that you've removed adding a PVC workflow from this patch. Will you be creating another PR that adds it? If not, it's likely that new failures will pop up on PVC. In UR we are already seeing new failures on our PVC e2e tests job.

pbalcer avatar Sep 05 '24 17:09 pbalcer

@uditagarwal97 I see that you've removed adding a PVC workflow from this patch. Will you be creating another PR that adds it? If not, it's likely that new failures will pop up on PVC. In UR we are already seeing new failures on our PVC e2e tests job.

@pbalcer Yes, I'll make another PR to enable PVC GH workflow. Will do it in a couple of hours :)

uditagarwal97 avatar Sep 05 '24 17:09 uditagarwal97

Awesome! I was worried there for a second :D

pbalcer avatar Sep 05 '24 17:09 pbalcer