llvm
llvm copied to clipboard
[SYCL] Fix marray math function impls
This PR aims to fix issue : https://github.com/intel/llvm/issues/5991 and provide efficient working marray math function implementations for all backends.
marray math function support is currently switched on for {n} ({n} defined in #5991) but the implementations are currently broken and untested. There is also very limited test coverage for sycl::vec cases. The sycl 2020 specification states that the set {N} ({N} defined in #5991) should be supported for marray math function cases.
All SYCL 2020 math, native math, and half_precision math functions now have marray support when the function's arguments are of type genfloat and have the same argument type for all arguments.
Tests: https://github.com/intel/llvm-test-suite/pull/1002
Signed-off-by: jack.kirk [email protected]
/verify with https://github.com/intel/llvm-test-suite/pull/1002
I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added.
The type lists including marrays, used in e.g. is_genfloat, are used in the has_known_identity trait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in https://github.com/intel/llvm/issues/5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are.
I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists.
@aobolensk @steffenlarsen what do you think?
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with intel/llvm-test-suite#1002
FYI I don't have access to see the failures from this. The tests are passing locally for cuda.
I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added. The type lists including marrays, used in e.g.
is_genfloat, are used in thehas_known_identitytrait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in #5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are.I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists.
@aobolensk @steffenlarsen what do you think?
I agree, genfloat is currently too restrictive on marray and we should loosen it. If we did, would this patch be obsolete or would these separate definitions still be required?
Can we extend existing tests to capture the new sizes?
I did not find any existing tests for marray math builtins: this makes sense since the existing implementation was broken because the implementation that was written for scalars/vectors cannot be used for marray cases.
I've added scalar_vector_* lists in this PR that omit marray types, so that math functions can distinguish the marray implementations I added. The type lists including marrays, used in e.g.
is_genfloat, are used in thehas_known_identitytrait class described in section 4.9.2. of the SYCL 2020 spec. The current marray lists include marrays of size from the set {n} (defined/discussed in #5991) which limits the spans used in array reductions to the set {n}. If we have array reductions then 4.9.2 does not state that they should be limited to the set {n}, although it does not specify what the admissible set of spans are. I think that it makes more sense to allow array reductions with any span (or at least a larger range than {n}) which would mean updating the marray type lists. @aobolensk @steffenlarsen what do you think?I agree,
genfloatis currently too restrictive onmarrayand we should loosen it. If we did, would this patch be obsolete or would these separate definitions still be required?
Loosening genfloat marray restrictions would not make this patch obsolete because the scalar/vector implementations of these math functions cannot be used for marray cases.
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp
[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp
Interesting. I can't reproduce these failures on my local machine but I can reproduce the half_precision_math_test_marray_vec.cpp on the A100 and the behaviour depends on the cudatoolkit version used: the test gets incorrect values for a single case "half_precision_math_test_2_powr". I have not been able to reproduce a failure from native_math_test_marray_vec.cpp. It would be useful to see the errors from the CI failure but I don't have access to http://llvm-ci2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/214/display/redirect. Would it be possible to provide the redirected link that I think I should have access to? Thanks.
/verify with https://github.com/intel/llvm-test-suite/pull/1002
[2022-06-28T02:21:02.113Z] Failed Tests (2): [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/half_precision_math_test_marray_vec.cpp [2022-06-28T02:21:02.113Z] SYCL :: DeviceLib/native_math_test_marray_vec.cpp
Interesting. I can't reproduce these failures on my local machine but I can reproduce the half_precision_math_test_marray_vec.cpp on the A100 and the behaviour depends on the cudatoolkit version used: the test gets incorrect values for a single case "half_precision_math_test_2_powr". I have not been able to reproduce a failure from native_math_test_marray_vec.cpp. It would be useful to see the errors from the CI failure but I don't have access to http://llvm-ci2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/214/display/redirect. Would it be possible to provide the redirected link that I think I should have access to? Thanks.
@pvchupin I identified the problem with "half_precision_math_test_2_powr" to be an existing bug (that we will investigate) that was there before this PR and was uncovered by the increased test coverage for float3 introduced in https://github.com/intel/llvm-test-suite/pull/1002. I've removed this test coverage but the CI is still failing and I can't reproduce the failure. Could you please post the redirected link from http://llvm-ci-test2.intel.com:8080/job/SYCL_CI/job/intel/job/LLVM_Test_Suite_Associate_CI/184/display/redirect so that I can see the failure? I don't have access to this link ending "redirect". Thanks
@JackAKirk, we use to have public dumps for these I think. @tfzhu can you confirm/check please if these are shutdown today? We need to move this stuff to Github Actions...
Here is the output: half_precision_math_test_marray_vec.cpp: Fails on both OpenCL and Level Zero
[2022-06-28T02:24:58.807Z] # command stderr:
[2022-06-28T02:24:58.807Z] half_precision_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/half_precision_math_test_marray_vec.cpp:57: void half_precision_math_test_exp10(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.
native_math_test_marray_vec.cpp: This one fails on OpenCL only
[2022-06-28T02:11:51.727Z] # command stderr:
[2022-06-28T02:11:51.728Z] native_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/native_math_test_marray_vec.cpp:55: void native_math_test_exp2(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.
[2022-06-28T02:11:51.728Z]
[2022-06-28T02:11:51.728Z] error: command failed with exit status: -6
We have also SYCL :: USM/depends_on.cpp failing on Windows only, but not sure if it's related. Nothing useful though in the output...
[2022-06-30T11:14:48.285Z] $ "env" "SYCL_DEVICE_FILTER=opencl:cpu,host" "W:\jenkins-dir\workspace\SYCL_CI\intel\Win\Test_Suite\build\SYCL\USM\Output\depends_on.cpp.tmp1.out"
[2022-06-30T11:14:48.285Z] note: command had no output on stdout or stderr
[2022-06-30T11:14:48.285Z] error: command failed with exit status: 3221225477
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
@JackAKirk, we use to have public dumps for these I think. @tfzhu can you confirm/check please if these are shutdown today? We need to move this stuff to Github Actions...
Here is the output: half_precision_math_test_marray_vec.cpp: Fails on both OpenCL and Level Zero
[2022-06-28T02:24:58.807Z] # command stderr: [2022-06-28T02:24:58.807Z] half_precision_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/half_precision_math_test_marray_vec.cpp:57: void half_precision_math_test_exp10(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed.native_math_test_marray_vec.cpp: This one fails on OpenCL only
[2022-06-28T02:11:51.727Z] # command stderr: [2022-06-28T02:11:51.728Z] native_math_test_marray_vec.cpp.tmp.out: /run/jenkins-dir/workspace/SYCL_CI/intel/Lin/LLVM_Test_Suite/llvm-test-suite/SYCL/DeviceLib/native_math_test_marray_vec.cpp:55: void native_math_test_exp2(sycl::queue &, T, T, size_t) [T = sycl::vec<float, 3>]: Assertion `checkEqual(result, ref)' failed. [2022-06-28T02:11:51.728Z] [2022-06-28T02:11:51.728Z] error: command failed with exit status: -6We have also SYCL :: USM/depends_on.cpp failing on Windows only, but not sure if it's related. Nothing useful though in the output...
[2022-06-30T11:14:48.285Z] $ "env" "SYCL_DEVICE_FILTER=opencl:cpu,host" "W:\jenkins-dir\workspace\SYCL_CI\intel\Win\Test_Suite\build\SYCL\USM\Output\depends_on.cpp.tmp1.out" [2022-06-30T11:14:48.285Z] note: command had no output on stdout or stderr [2022-06-30T11:14:48.285Z] error: command failed with exit status: 3221225477
It seems that there are some problems with some half_precision and native math functions with generic implementations when using float3 (these are all existing problems in sycl tip that the increased test coverage is revealing), although I can't reproduce the particular failures that you listed above locally. I've removed native_math_test_exp2 and half_precision_math_test_exp10 float3 test cases from https://github.com/intel/llvm-test-suite/pull/1002 but Jenkins/llvm-test-suite is still failing (I still can't see the results). Is it now only failing for USM/depends_on.cpp? USM/depends_on.cpp should have no dependence on PR changes here.
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
/verify with https://github.com/intel/llvm-test-suite/pull/1002
I have now also added the -ffast-math flag support for the new marray cases, in line with the scalar/vector cases : https://github.com/intel/llvm/pull/5801 . This also affects/motivates the macro usage in builtins.hpp. The corresponding fast-math-flag.cpp test has been updated in https://github.com/intel/llvm-test-suite/pull/1002 to cover the new cases. I've also added a device-code test here via native-math-cuda.cpp. This test checks that the correct native math instructions are called for the cuda backend. The precision of these native math functions is detailed in table 10 here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#intrinsic-functions. I thought that it would be sensible to simply check that these functions are being correctly called rather than making fresh tests that check the quoted native precision. I have removed the more basic and general native (and half_precision) math function tests (a la https://github.com/intel/llvm-test-suite/blob/intel/SYCL/DeviceLib/math_test.cpp) that I originally added in https://github.com/intel/llvm-test-suite/pull/1002 because they are inappropriate for the opencl backend because the precision of the opencl backend is not precise enough to pass the tests. I think that at some point there should be added some more rigorous tests such as exist in the opencl CTS (https://github.com/KhronosGroup/OpenCL-CTS/blob/2d93b122c3078cd67a0528ad9e791dbcadaf03d6/test_common/harness/errorHelpers.cpp).
Anyway, the marray cases added here have passing test coverage at least matching the existing scalar/vector cases, plus the CUDA backend now has the native math builtins device code check.
I've just managed to access the full llvm-test-suite results and seen the windows failure due to the change to marray.hpp:324. Looking into this now...
/verify with https://github.com/intel/llvm-test-suite/pull/1002
I've just managed to access the full llvm-test-suite results and seen the windows failure due to the change to marray.hpp:324. Looking into this now...
I've partially dealt with this but there are still failures on windows that I need to fix.... marking as draft again..
/verify with https://github.com/intel/llvm-test-suite/pull/1002