llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Fix marray math function impls

Open JackAKirk opened this issue 3 years ago • 36 comments

This PR aims to fix issue : https://github.com/intel/llvm/issues/5991 and provide efficient working marray math function implementations for all backends.

marray math function support is currently switched on for {n} ({n} defined in #5991) but the implementations are currently broken and untested. There is also very limited test coverage for sycl::vec cases. The sycl 2020 specification states that the set {N} ({N} defined in #5991) should be supported for marray math function cases.

All SYCL 2020 math, native math, and half_precision math functions now have marray support when the function's arguments are of type genfloat and have the same argument type for all arguments.

Tests: https://github.com/intel/llvm-test-suite/pull/1002

Signed-off-by: jack.kirk [email protected]

JackAKirk avatar Apr 21 '22 17:04 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Apr 27 '22 15:04 JackAKirk

I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added. The type lists including marrays, used in e.g. is_genfloat, are used in the has_known_identity trait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in https://github.com/intel/llvm/issues/5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are.

I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists.

@aobolensk @steffenlarsen what do you think?

JackAKirk avatar May 11 '22 16:05 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar May 11 '22 16:05 JackAKirk

/verify with intel/llvm-test-suite#1002

FYI I don't have access to see the failures from this. The tests are passing locally for cuda.

JackAKirk avatar May 13 '22 09:05 JackAKirk

I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added. The type lists including marrays, used in e.g. is_genfloat, are used in the has_known_identity trait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in #5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are.

I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists.

@aobolensk @steffenlarsen what do you think?

I agree, genfloat is currently too restrictive on marray and we should loosen it. If we did, would this patch be obsolete or would these separate definitions still be required?

steffenlarsen avatar May 17 '22 18:05 steffenlarsen

Can we extend existing tests to capture the new sizes?

I did not find any existing tests for marray math builtins: this makes sense since the existing implementation was broken because the implementation that was written for scalars/vectors cannot be used for marray cases.

JackAKirk avatar May 18 '22 16:05 JackAKirk

I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added. The type lists including marrays, used in e.g. is_genfloat, are used in the has_known_identity trait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in #5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are. I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists. @aobolensk @steffenlarsen what do you think?

I agree, genfloat is currently too restrictive on marray and we should loosen it. If we did, would this patch be obsolete or would these separate definitions still be required?

Loosening genfloat marray restrictions would not make this patch obsolete because the scalar/vector implementations of these math functions cannot be used for marray cases.

JackAKirk avatar May 18 '22 16:05 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

steffenlarsen avatar Jun 20 '22 19:06 steffenlarsen

/verify with https://github.com/intel/llvm-test-suite/pull/1002

pvchupin avatar Jun 28 '22 00:06 pvchupin

[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp

pvchupin avatar Jun 28 '22 23:06 pvchupin

[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp

Interesting. I can't reproduce these failures on my local machine but I can reproduce the half_precision_math_test_marray_vec.cpp on the A100 and the behaviour depends on the cudatoolkit version used: the test gets incorrect values for a single case "half_precision_math_test_2_powr". I have not been able to reproduce a failure from native_math_test_marray_vec.cpp. It would be useful to see the errors from the CI failure but I don't have access to http://llvm-ci2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/214/display/redirect. Would it be possible to provide the redirected link that I think I should have access to? Thanks.

JackAKirk avatar Jun 29 '22 11:06 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Jun 30 '22 10:06 JackAKirk

[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp

Interesting. I can't reproduce these failures on my local machine but I can reproduce the half_precision_math_test_marray_vec.cpp on the A100 and the behaviour depends on the cudatoolkit version used: the test gets incorrect values for a single case "half_precision_math_test_2_powr". I have not been able to reproduce a failure from native_math_test_marray_vec.cpp. It would be useful to see the errors from the CI failure but I don't have access to http://llvm-ci2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/214/display/redirect. Would it be possible to provide the redirected link that I think I should have access to? Thanks.

@pvchupin I identified the problem with "half_precision_math_test_2_powr" to be an existing bug (that we will investigate) that was there before this PR and was uncovered by the increased test coverage for float3 introduced in https://github.com/intel/llvm-test-suite/pull/1002. I've removed this test coverage but the CI is still failing and I can't reproduce the failure. Could you please post the redirected link from http://llvm-ci-test2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/184/display/redirect so that I can see the failure? I don't have access to this link ending "redirect". Thanks

JackAKirk avatar Jun 30 '22 13:06 JackAKirk

@JackAKirk, we use to have public dumps for these I think. @tfzhu can you confirm/check please if these are shutdown today? We need to move this stuff to Github Actions...

Here is the output: half_precision_math_test_marray_vec.cpp: Fails on both OpenCL and Level Zero

[2022-06-28T02:24:58.807Z] # command stderr:
[2022-06-28T02:24:58.807Z] half_precision_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/half_precision_math_test_marray_vec.cpp:57: void half_precision_math_test_exp10(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.

native_math_test_marray_vec.cpp: This one fails on OpenCL only

[2022-06-28T02:11:51.727Z] # command stderr:
[2022-06-28T02:11:51.728Z] native_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/native_math_test_marray_vec.cpp:55: void native_math_test_exp2(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.
[2022-06-28T02:11:51.728Z] 
[2022-06-28T02:11:51.728Z] error: command failed with exit status: -6

We have also SYCL :: USM/depends_on.cpp failing on Windows only, but not sure if it's related. Nothing useful though in the output...

[2022-06-30T11:14:48.285Z] $ "env" "SYCL_DEVICE_FILTER=opencl:cpu,host" "W:\jenkins-dir\workspace\SYCL_CI\intel\Win\Test_Suite\build\SYCL\USM\Output\depends_on.cpp.tmp1.out"
[2022-06-30T11:14:48.285Z] note: command had no output on stdout or stderr
[2022-06-30T11:14:48.285Z] error: command failed with exit status: 3221225477

pvchupin avatar Jun 30 '22 22:06 pvchupin

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Jul 05 '22 14:07 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Jul 12 '22 09:07 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Jul 12 '22 12:07 JackAKirk

@JackAKirk, we use to have public dumps for these I think. @tfzhu can you confirm/check please if these are shutdown today? We need to move this stuff to Github Actions...

Here is the output: half_precision_math_test_marray_vec.cpp: Fails on both OpenCL and Level Zero

[2022-06-28T02:24:58.807Z] # command stderr:
[2022-06-28T02:24:58.807Z] half_precision_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/half_precision_math_test_marray_vec.cpp:57: void half_precision_math_test_exp10(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.

native_math_test_marray_vec.cpp: This one fails on OpenCL only

[2022-06-28T02:11:51.727Z] # command stderr:
[2022-06-28T02:11:51.728Z] native_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/native_math_test_marray_vec.cpp:55: void native_math_test_exp2(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.
[2022-06-28T02:11:51.728Z] 
[2022-06-28T02:11:51.728Z] error: command failed with exit status: -6

We have also SYCL :: USM/depends_on.cpp failing on Windows only, but not sure if it's related. Nothing useful though in the output...

[2022-06-30T11:14:48.285Z] $ "env" "SYCL_DEVICE_FILTER=opencl:cpu,host" "W:\jenkins-dir\workspace\SYCL_CI\intel\Win\Test_Suite\build\SYCL\USM\Output\depends_on.cpp.tmp1.out"
[2022-06-30T11:14:48.285Z] note: command had no output on stdout or stderr
[2022-06-30T11:14:48.285Z] error: command failed with exit status: 3221225477

It seems that there are some problems with some half_precision and native math functions with generic implementations when using float3 (these are all existing problems in sycl tip that the increased test coverage is revealing), although I can't reproduce the particular failures that you listed above locally. I've removed native_math_test_exp2 and half_precision_math_test_exp10 float3 test cases from https://github.com/intel/llvm-test-suite/pull/1002 but Jenkins/llvm-test-suite is still failing (I still can't see the results). Is it now only failing for USM/depends_on.cpp? USM/depends_on.cpp should have no dependence on PR changes here.

JackAKirk avatar Jul 12 '22 15:07 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 05 '22 15:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 06 '22 12:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 06 '22 15:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 07 '22 09:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 13 '22 15:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 14 '22 09:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 14 '22 20:09 JackAKirk

I have now also added the -ffast-math flag support for the new marray cases, in line with the scalar/vector cases : https://github.com/intel/llvm/pull/5801 . This also affects/motivates the macro usage in builtins.hpp. The corresponding fast-math-flag.cpp test has been updated in https://github.com/intel/llvm-test-suite/pull/1002 to cover the new cases. I've also added a device-code test here via native-math-cuda.cpp. This test checks that the correct native math instructions are called for the cuda backend. The precision of these native math functions is detailed in table 10 here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#intrinsic-functions. I thought that it would be sensible to simply check that these functions are being correctly called rather than making fresh tests that check the quoted native precision. I have removed the more basic and general native (and half_precision) math function tests (a la https://github.com/intel/llvm-test-suite/blob/intel/SYCL/DeviceLib/math_test.cpp) that I originally added in https://github.com/intel/llvm-test-suite/pull/1002 because they are inappropriate for the opencl backend because the precision of the opencl backend is not precise enough to pass the tests. I think that at some point there should be added some more rigorous tests such as exist in the opencl CTS (https://github.com/KhronosGroup/OpenCL-CTS/blob/2d93b122c3078cd67a0528ad9e791dbcadaf03d6/test_common/harness/errorHelpers.cpp).

Anyway, the marray cases added here have passing test coverage at least matching the existing scalar/vector cases, plus the CUDA backend now has the native math builtins device code check.

JackAKirk avatar Sep 15 '22 09:09 JackAKirk

I've just managed to access the full llvm-test-suite results and seen the windows failure due to the change to marray.hpp:324. Looking into this now...

JackAKirk avatar Sep 16 '22 09:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 16 '22 15:09 JackAKirk

I've just managed to access the full llvm-test-suite results and seen the windows failure due to the change to marray.hpp:324. Looking into this now...

I've partially dealt with this but there are still failures on windows that I need to fix.... marking as draft again..

JackAKirk avatar Sep 16 '22 16:09 JackAKirk

/verify with https://github.com/intel/llvm-test-suite/pull/1002

JackAKirk avatar Sep 21 '22 16:09 JackAKirk