highway icon indicating copy to clipboard operation
highway copied to clipboard

`RVV` target test failures

Open lsrcz opened this issue 10 months ago • 1 comments

Hello, I compiled the HEAD version of highway (commit: 3cb5c1ae3f9adbf69226f2231791895765a5869b) with a recent snapshot of clang-19 (commit: 44af53b), and I got failed tests.

The tests were run on qemu-riscv, and the following is the test outputs:

$ ctest --rerun-failed --output-on-failure -v                                                                 ─╯
Test project /home/siruilu/highway/build
    Start 565: MatVecTestGroup/MatVecTest.TestAllMatVec/RVV  # GetParam() = 137438953472
1/2 Test #565: MatVecTestGroup/MatVecTest.TestAllMatVec/RVV  # GetParam() = 137438953472 ...***Failed    0.21 sec
Running main() from /home/siruilu/highway/build/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = MatVecTestGroup/MatVecTest.TestAllMatVec/RVV
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MatVecTestGroup/MatVecTest
[ RUN      ] MatVecTestGroup/MatVecTest.TestAllMatVec/RVV
f64/f64 6 x 8, with add: mismatch at 0 0.000000 0.000000; tol 0.000000
Abort at matvec_test.cc:178: Assert 0

    Start 643: SortTestGroup/SortTest.TestAllPartition/RVV  # GetParam() = 137438953472
2/2 Test #643: SortTestGroup/SortTest.TestAllPartition/RVV  # GetParam() = 137438953472 ....***Failed    0.04 sec
Running main() from /home/siruilu/highway/build/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = SortTestGroup/SortTest.TestAllPartition/RVV
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SortTestGroup/SortTest
[ RUN      ] SortTestGroup/SortTest.TestAllPartition/RVV
Abort at sort_test.cc:342: U128: asc 1 left[0] piv 0 2 compares before 2 1 border 8


0% tests passed, 2 tests failed out of 2

Total Test time (real) =   0.25 sec

The following tests FAILED:
        565 - MatVecTestGroup/MatVecTest.TestAllMatVec/RVV  # GetParam() = 137438953472 (Failed)
        643 - SortTestGroup/SortTest.TestAllPartition/RVV  # GetParam() = 137438953472 (Failed)
Errors while running CTest

lsrcz avatar Apr 10 '24 22:04 lsrcz

Thanks for letting us know. Unfortunately RVV tests in our CI are currently disabled because the toolchain is crashing. We've filed an LLVM bug:)

jan-wassenberg avatar Apr 11 '24 05:04 jan-wassenberg

Meanwhile, LLVM has rolled back the patch that caused the compiler crash. Our RVV tests are again usable. Would you like to file an LLVM issue for the test failure on clang-19?

jan-wassenberg avatar May 27 '24 14:05 jan-wassenberg

Hi @jan-wassenberg , I have just rerun the tests with clang-17.0.6 and a recent version of clang (2ace7bd), and they both failed the SortTestGroup/SortTest.TestAllPartition/RVV test.

The MatVecTestGroup/MatVecTest.TestAllMatVec/RVV is no longer failing, so I believe that the bug causing it may have been fixed.

lsrcz avatar May 28 '24 18:05 lsrcz

Interesting, thanks for checking. Our toolchain doesn't come with an exact version, but it is relatively close to LLVM HEAD. It seems to succeed with -march=rv64gcv1p0. What build flags are you using?

jan-wassenberg avatar May 29 '24 13:05 jan-wassenberg

I am using the default flags, and I believe that it is -march=rv64gcv1p0 after checking the generated build.ninja.

However, I found that the test only fails when VLEN is 128. What VLEN are you using for your CI?

lsrcz avatar May 29 '24 17:05 lsrcz

Oh, good catch! That's likely it. We seem to have 512-bit VLEN. Note that SortTag uses LMUL=1/2.

The problem is that the base case is meant to handle at least two vectors (and Partition relies upon that), but the base case is also VL-dependent and we only handle up to 16 elements. Hopefully we can cap the vector size.

jan-wassenberg avatar May 30 '24 15:05 jan-wassenberg

The sort itself does check for the problem, but TestAllPartition did not, and soon will. Unfortunately our CI doesn't work with 1024, so not entirely certain this fixes it.

jan-wassenberg avatar May 30 '24 16:05 jan-wassenberg

Thanks. I tried the fa61572 commit, and it seems to work under VLEN=128/1024 on qemu.

lsrcz avatar May 30 '24 16:05 lsrcz

Thanks for confirming!

jan-wassenberg avatar May 30 '24 17:05 jan-wassenberg