llvm icon indicating copy to clipboard operation
llvm copied to clipboard

PI port: post merge work

Open aarongreig opened this issue 1 year ago • 9 comments

I'm creating this issue to capture any work that will need to be done post-merge, inevitably we've had to leave some stuff unfinished to make the ABI breaking window.

For reference, this is the PI merge change: https://github.com/intel/llvm/pull/14145

Tasks from review

  • [x] #14922
  • [x] #14926
  • [x] #14927
  • [x] #14924
  • [ ] #14928
  • [ ] #14923
  • [x] #14932
  • [x] #14919
  • [x] https://github.com/oneapi-src/unified-runtime/issues/1890
  • [x] #15109
  • [x] This PR changes seems to might have been reverted after PI removal, so we need to investigate this. This was first mentioned in this comment
  • [x] #15149

Regressions

This is the list of known regressions that have had XFAIL added to them in anticipation of post-merge fixes.

From sycl/test/:

  • [x] #14726
  • [x] native_cpu/call_host_func.cpp
  • [x] native_cpu/check-pi-output.cpp
  • [x] native_cpu/driver-fsycl.cpp
  • [x] #14660
  • [x] native_cpu/global-id-range.cpp
  • [x] native_cpu/globaloffsetchecks.cpp
  • [x] #14745
  • [x] #14746
  • [x] native_cpu/local_basic.cpp
  • [x] native_cpu/multi-devices-swap.cpp
  • [x] native_cpu/multi-devices.cpp
  • [x] native_cpu/multiple_tu.cpp
  • [x] native_cpu/no-dead-arg.cpp
  • [x] native_cpu/no-opt.cpp
  • [x] native_cpu/readwrite_rectops.cpp
  • [x] native_cpu/scalar_args.cpp
  • [x] native_cpu/sycl-external-static.cpp
  • [x] native_cpu/sycl-external.cpp
  • [x] native_cpu/unnamed.cpp
  • [x] native_cpu/unused-regression.cpp
  • [x] native_cpu/user-defined-private-type.cpp
  • [x] native_cpu/user-defined-type.cpp
  • [x] native_cpu/usm_basic.cpp
  • [x] native_cpu/vector-add.cpp

(all but a handful of these will be fixed by the inclusion of https://github.com/oneapi-src/unified-runtime/pull/1871)

From sycl/test-e2e

  • [x] #14658
  • [x] AddressSanitizer/common/kernel-debug.cpp
  • [x] AddressSanitizer/multiple-reports/multiple_kernels.cpp
  • [x] AddressSanitizer/multiple-reports/one_kernel.cpp
  • [x] AddressSanitizer/use-after-free/quarantine-free.cpp
  • [x] Basic/aspects.cpp
  • [x] #14663
  • [x] #14665
  • [x] #14675
  • [x] #14676
  • [x] #14679
  • [x] #14680
  • [x] #14681
  • [x] #14682
  • [x] #14702
  • [x] #14763
  • [x] #14709
  • [x] #14712
  • [x] #14662
  • [x] #14711
  • [x] #14706
  • [x] #14708
  • [x] #14661
  • [x] #14704
  • [x] #14721
  • [x] #14738
  • [x] #14741
  • [x] #14722
  • [x] #14723
  • [x] #14724
  • [x] #14742
  • [x] #14714
  • [x] #14744
  • [x] #14703
  • [x] #14659

Windows only:

  • [x] #14768
    • [x] Basic/queue/release.cpp
    • [x] Scheduler/ReleaseResourcesTest.cpp
    • [x] #14950
    • [x] Regression/context_is_destroyed_after_exception.cpp
  • [x] #14775
  • [x] KernelAndProgram/cache_env_vars_win.cpp - #14709
  • [x] #14968
  • [x] Plugin/dll-detach-order.cpp - #14767
  • [x] #15006
  • [x] #15011

Cuda only:

  • [x] HostInteropTask/interop-task-cuda-buffer-migrate.cpp
  • [x] #14666
  • [x] #14664

New regressions as of merge on 24/07

  • [x] #14749
  • [x] #14795
  • [x] Matrix/element_wise_all_ops_1d.cpp (CL/L0 arc only)
  • [x] Matrix/element_wise_all_ops_1d_cont.cpp (CL/L0 arc only)
  • [x] Matrix/element_wise_all_ops_scalar.cpp (CL/L0 arc only)
  • [x] Matrix/element_wise_all_sizes.cpp (CL/L0 arc only)
  • [x] #14764
  • [x] #14804
  • [x] #14805
  • [x] #14765

unittests:

  • [x] SYCL2020/KernelBundleStateFiltering.cpp

New regressions as of merge on 26/07

  • [x] #14806
  • [x] #14841
  • [x] DeviceGlobal/device_global_arrow.cpp
  • [x] DeviceGlobal/device_global_device_only.cpp
  • [x] DeviceGlobal/device_global_operator_passthrough.cpp
  • [x] DeviceGlobal/device_global_subscript.cpp

It seems we have also introduced some regressions in the sycl cts, so:

  • [x] #14819

aarongreig avatar Jul 17 '24 11:07 aarongreig

Basic/aspects.cpp is working normally and passing with the PI2UR PR, it also does not have an XFAIL on the PR, it only had a comment directing to this issue which should be deleted.

omarahmed1111 avatar Jul 19 '24 14:07 omarahmed1111

@omarahmed1111 could you please clarify the status of these tests?

 AddressSanitizer/common/kernel-debug.cpp
 AddressSanitizer/multiple-reports/multiple_kernels.cpp
 AddressSanitizer/multiple-reports/one_kernel.cpp
 AddressSanitizer/use-after-free/quarantine-free.cpp

They're failing in Nightly, but there is no issue for them. Should we create one?

KornevNikita avatar Jul 29 '24 10:07 KornevNikita

Ah it seems we missed xfailing those, they are affected by the same issue as https://github.com/intel/llvm/issues/14658. We do have a fix https://github.com/oneapi-src/unified-runtime/pull/1883 but I think we're prioritizing merging feature PRs for now so I'll open a PR to xfail them to unblock the nightly.

aarongreig avatar Jul 29 '24 11:07 aarongreig

Ah it seems we missed xfailing those, they are affected by the same issue as #14658. We do have a fix oneapi-src/unified-runtime#1883 but I think we're prioritizing merging feature PRs for now so I'll open a PR to xfail them to unblock the nightly.

Ok, good.

After port there're also failures in SYCL-CTS: https://github.com/intel/llvm/actions/runs/10137365848/job/28027872635 All like this:

  /__w/llvm/llvm/khronos_sycl_cts/tests/queue/queue_shortcuts_explicit_core.cpp:30: FAILED:
    {Unknown expression after the reported line}
  due to unexpected exception with messages:
    for type "char": 
    SYCL exception
    with category name: 'sycl'
    with code: 'runtime'
    with code raw value: '1'
    with code message: 'SYCL Error'
    with what: 'Native API failed. Native API returns: 45
    (UR_RESULT_ERROR_INVALID_ARGUMENT)'

Do you know if it's some known issue?

KornevNikita avatar Jul 29 '24 11:07 KornevNikita

It isn't a known problem, I've added an issue to the OP to investigate. I have seen fails like that before so it's very likely that a fix for one of the remaining e2e regressions will also fix the cts.

aarongreig avatar Jul 29 '24 11:07 aarongreig

FYI, while enabling E2E tests on PVC (https://github.com/intel/llvm/pull/14720), I see a failure in Matrix/SG32/element_wise_all_ops.cpp, which could be related to PI2UR work. Similar to the failure in Matrix/element_wise_all_ops.cpp (https://github.com/intel/llvm/issues/14795).

uditagarwal97 avatar Jul 30 '24 20:07 uditagarwal97

Ah it seems we missed xfailing those, they are affected by the same issue as #14658. We do have a fix oneapi-src/unified-runtime#1883 but I think we're prioritizing merging feature PRs for now so I'll open a PR to xfail them to unblock the nightly.

Ok, good.

After port there're also failures in SYCL-CTS: https://github.com/intel/llvm/actions/runs/10137365848/job/28027872635 All like this:

  /__w/llvm/llvm/khronos_sycl_cts/tests/queue/queue_shortcuts_explicit_core.cpp:30: FAILED:
    {Unknown expression after the reported line}
  due to unexpected exception with messages:
    for type "char": 
    SYCL exception
    with category name: 'sycl'
    with code: 'runtime'
    with code raw value: '1'
    with code message: 'SYCL Error'
    with what: 'Native API failed. Native API returns: 45
    (UR_RESULT_ERROR_INVALID_ARGUMENT)'

Do you know if it's some known issue?

@KornevNikita https://github.com/intel/llvm/pull/14873 should fix the SYCL CTS, it has been confirmed to fix 2 of the 3 failures at the time of writing but the run has not completed.

kbenzie avatar Jul 31 '24 16:07 kbenzie

@kbenzie thanks, I'll check it manually

KornevNikita avatar Jul 31 '24 16:07 KornevNikita

@kbenzie thanks, I'll check it manually

yep, that helps, thanks.

KornevNikita avatar Jul 31 '24 17:07 KornevNikita