highway
highway copied to clipboard
Implement runtime dispatch on riscv64
This is a ticket to track progress on runtime dispatch for riscv64 (followup to issue #818).
For now, unit tests are failing, see Debian/riscv64 buildd:
- https://buildd.debian.org/status/fetch.php?pkg=highway&arch=riscv64&ver=0.17.1%7Egit20220707.b0108ff-1&stamp=1657192132&raw=0
The CPU/emulator running the test likely doesn't support V extension 1.0 yet, but we are building with flags that allow the compiler to assume that the CPU does support it.
Would anyone like to send a patch to targets.cc (analogous to the AT_HWCAP there), to check whether the V extension is supported?
Independently of that, I can help fix the warnings from your build log, thanks for sharing it.
This may take a bit of back-n-forth as Debian does not offer riscv64 porterbox:
- https://wiki.debian.org/RISC-V#Porterboxes
So I kindly requested on debian-riscv mailing list to do it for me:
- https://lists.debian.org/debian-riscv/2022/07/msg00020.html
will update the progress here.
Sounds good, thanks!
I was granted access to the following hardware (*), it seems that the instructions is not supported:
$ ctest -R TestAllMulHigh/RVV -V
[...]
147: Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV
147: [==========] Running 1 test from 1 test suite.
147: [----------] Global test environment set-up.
147: [----------] 1 test from HwyMulTestGroup/HwyMulTest
147: [ RUN ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV
1/1 Test #147: HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV # GetParam() = 268435456 ...***Exception: Illegal 0.13 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 0.18 sec
The following tests FAILED:
147 - HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV # GetParam() = 268435456 (ILLEGAL)
Errors while running CTest
Output from these tests are in: /home/malaterre/highway-0.17.1~git20220711.f0a396a/obj-riscv64-linux-gnu/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
(*)
$ cat /proc/cpuinfo
processor : 0
hart : 2
isa : rv64imafdc
mmu : sv39
uarch : sifive,bullet0
processor : 1
hart : 1
isa : rv64imafdc
mmu : sv39
uarch : sifive,bullet0
processor : 2
hart : 3
isa : rv64imafdc
mmu : sv39
uarch : sifive,bullet0
processor : 3
hart : 4
isa : rv64imafdc
mmu : sv39
uarch : sifive,bullet0
For reference:
$ cat /usr/include/riscv64-linux-gnu/asm/hwcap.h
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
* Copied from arch/arm64/include/asm/hwcap.h
*
* Copyright (C) 2012 ARM Ltd.
* Copyright (C) 2017 SiFive
*/
#ifndef _ASM_RISCV_HWCAP_H
#define _ASM_RISCV_HWCAP_H
/*
* Linux saves the floating-point registers according to the ISA Linux is
* executing on, as opposed to the ISA the user program is compiled for. This
* is necessary for a handful of esoteric use cases: for example, userspace
* threading libraries must be able to examine the actual machine state in
* order to fully reconstruct the state of a thread.
*/
#define COMPAT_HWCAP_ISA_I (1 << ('I' - 'A'))
#define COMPAT_HWCAP_ISA_M (1 << ('M' - 'A'))
#define COMPAT_HWCAP_ISA_A (1 << ('A' - 'A'))
#define COMPAT_HWCAP_ISA_F (1 << ('F' - 'A'))
#define COMPAT_HWCAP_ISA_D (1 << ('D' - 'A'))
#define COMPAT_HWCAP_ISA_C (1 << ('C' - 'A'))
#endif /* _ASM_RISCV_HWCAP_H */
That's right, we'd have to see v in the list of isa extensions. I see mention of Sifive bullet from 2020, whereas the V spec was ratified only a few months ago. We're testing using Spike or QEMU.
Just for reference, current debian patch is:
- https://salsa.debian.org/debian-phototools-team/highway/-/blob/debian/0.17.1_git20220711.f0a396a-1/debian/patches/riscv.patch
Nice, I think the bit we want to test is 1 << ('v' - 'a').
1 << ('v' - 'a')
Lower-case ? Anyway I'll integrate the patch as-is in next upload.
@jan-wassenberg could you comment on my patch see above. it seems I am missing something:
[...]
[28/103] /usr/bin/clang++-14 -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wgnu-redeclared-enum -Winfinite-recursion -Wself-assign -Wstring-conversion -Wtautological-overlap-compare -Wthread-safety-analysis -Wundefined-func-template -fno-cxx-exceptions -fno-slp-vectorize -fno-vectorize -fdiagnostics-show-option -fcolor-diagnostics -Wc++2a-extensions -fmath-errno -fno-exceptions -march=rv64gcv1p0 -menable-experimental-extensions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o -c '/<<PKGBUILDDIR>>/hwy/contrib/image/image.cc'
[...]
thje above seems to puts v-extensions in shared logic:
- https://github.com/google/highway/issues/856#issuecomment-1186174188
Yes indeed, you could also write 'V' - 'A' (it is the same).
For the patch, we want to also do what was done for Arm (commit). This is a bit harder to see because the patch also included other required fixes, but the key parts we haven't yet added are:
-
In detect_targets.h:405
#define HWY_ATTAINABLE_TARGETS
should also be done ifARCH_RVV
in addition toARCH_ARM
. -
In set_macros-inl.h we want to
#define HWY_TARGET_STR
to whatever the compiler requires. This is usually xx if the compiler flag were -mxx, but on RVV we previously used -march=rv64gcv1p0 so I'm not sure what the syntax is.
Actually I see in the LLVM headers that RVV currently has the same issue as NEON/SVE: they do not yet support runtime dispatch. An issue has been filed and discussions are ongoing at least for Arm.
It's possible that GCC already supports this for RVV like they do for NEON/SVE. If so, you would see in its riscv_vector.h some target attribute in the function definitions, and that is what we want to define HWY_TARGET_STR to.
It's possible that GCC already supports this for RVV like they do for NEON/SVE. If so, you would see in its riscv_vector.h some target attribute in the function definitions, and that is what we want to define HWY_TARGET_STR to.
I cannot find a file riscv_vector.h
in my gcc-12 install tree:
- https://packages.debian.org/sid/riscv64/libgcc-12-dev/filelist
clang seems to offer it:
- https://packages.debian.org/sid/riscv64/libclang-common-14-dev/filelist
But as you guessed the include file is messed up:
% head -20 /usr/lib/llvm-14/lib/clang/14.0.6/include/riscv_vector.h | tail -5
#ifndef __riscv_vector
#error "Vector intrinsics require the vector extension."
#endif
For reference:
- https://github.com/llvm/llvm-project/issues/56592
Yes indeed, you could also write 'V' - 'A' (it is the same).
Right, silly me...
In any case I do not see this integrated upstream:
- https://github.com/torvalds/linux/blob/v5.19-rc7/arch/riscv/include/asm/hwcap.h#L27-L35
where did you come up with the value ? should we report an issue in linux upstream (getauxval
may need this definition) ?
Nice, thanks for filing the LLVM issue. For the AT_* values, I simply extrapolated from the fact that RISC-V extensions are (or were mostly) identified with a one-letter name, and the convention seems to be a dense bit array in alphabetical order.
For HWY_TARGET_STR, this comment suggests that arch=rv64gcv1p0 might be exactly what we want.
Current WIP in Debian is at:
- https://salsa.debian.org/debian-phototools-team/highway/-/blob/debian/experimental/debian/patches/riscv.patch
Current gcc status:
- https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1106#issuecomment-1233901350
See:
- https://github.com/riscv-collab/riscv-gcc/blob/riscv-gcc-rvv-next/gcc/config/riscv/riscv-vector.h
rdcycle
is currently producing SIGILL on Debian env, so no progress on riscv64 for me until issue is solved:
- https://lists.debian.org/debian-riscv/2022/08/msg00049.html
Thanks for making us aware. This situation is regrettable: an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension, which is not yet ratified. I fail to see how it makes sense to ship a board without something as basic as a timer (nor how this was considered 'optional' in the spec), but both seem to have happened.
We will disable it for now and fall back to clock_gettime
.
an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension
wait, what ? SIGILL is produced on the same actual physical board. I fail to understand why an instruction would suddenly fail to execute depending on the running linux kernel.
Are we sure about it being the same board? I saw this in the Debian discussion:
- Hifive Unleashed running a 5.10.28 kernel
- Polarfire Icicle running kernel 5.18.14-1 However it produces a SIGILL on
- Hifive Unmatched running a 5.18.14-1 kernel
- Hifive Unmatched running a 5.18.16-1 kernel
Are we sure about it being the same board?
mailing list archive are hard to read from one month to the other :(
Here is my post from this morning:
- https://lists.debian.org/debian-riscv/2022/09/msg00000.html
If you want the full summary, I also posted here:
- https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE/m/gdKCQn-dBwAJ
Repeated here for clarity:
It works fine on
- QEMU running a 5.18.16-1 kernel
- Hifive Unleashed running a 5.10.28 kernel
- Hifive Unmatched running a 5.16.14-1 kernel
- Polarfire Icicle running kernel 5.18.14-1
However it produces a SIGILL on
- Hifive Unmatched running a 5.18.14-1 kernel
- Hifive Unmatched running a 5.18.16-1 kernel
Thanks for making us aware. This situation is regrettable: an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension, which is not yet ratified. I fail to see how it makes sense to ship a board without something as basic as a timer (nor how this was considered 'optional' in the spec), but both seem to have happened.
We will disable it for now and fall back to
clock_gettime
.
Yes, moving things out of the base spec since 2.0 has created a mess, but this is not the issue. The issue is the Linux kernel deliberately disables access to the cycle counter in userspace since 5.18. It does permit access to the timer in userspace. Pre-5.18 the relevant configuration was left untouched so the cycle counter was enabled, but 5.18 landed a commit (https://github.com/torvalds/linux/commit/e9991434596f5373dfd75857b445eb92a9253c56) that added a perf driver, with the side-effect of blocking everything but the timer in userspace (even instret is disabled...), which should still work.
Also note checking for __riscv_zicntr
isn't that good an idea; if you use an older toolchain, or any version of Clang, that doesn't exist because it's treated as part of RV64I, not because it thinks you shouldn't use the counters... (and they wouldn't accept -march=..._zicntr either).
For reference:
The breakage happened because the new pmu driver only enabled TM bit by default instead of all three. The change was intentional due to security reasons. One rogue process can have access to cycle & instructions of the entire kernel always which can lead to some sort of side channel attacks.
However, I agree that we can't break userspace. I was not aware of the fact that there are already users of rdcycle in the userspace. I will send a patch to restore the original behavior by enabling CY, IR bits in scounteren."
- https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE/m/Lq7G6jXSBwAJ
For reference:
- https://github.com/google/highway/pull/961
Thanks @malaterre for CCing me on the email, and thanks @jrtc27 for clearing up my misunderstanding. (The fact that Zicntr is an extension looks like a more theoretical problem, then.)
Let's discuss via email. @jrtc27 , would you also like to join that discussion?
For the AT_* values, I simply extrapolated from the fact that RISC-V extensions are (or were mostly) identified with a one-letter name, and the convention seems to be a dense bit array in alphabetical order.
Seems like this is still not merged upstream:
- https://www.spinics.net/lists/kernel/msg4354764.html
An update here for the record, related work ongoing in #1017. @malaterre has updated our timer to use rdtime instead of rdcycle, though apparently newer Linux will once again allow rdcycle.
Do we want to integrate your patch already to allow runtime dispatch on riscv64 + clang?
@jan-wassenberg there is no-point in trying to merge any of my code for runtime dispatch on riscv64, see above:
- https://github.com/google/highway/issues/838#issuecomment-1238243830
we need to have getauxval (ideally) or at least minimal kernel support from user-space. So far only kernel mode can query the hardware support.
Do I understand correctly that the issue is that we do have getauxval
, but not the bit definitions macro for the V extension?
If so, it seems that the patch would remain correct in future when a macro is added, if we used 1 << ('V' - 'A')
which is what the macro will almost certainly evaluate to? I see COMPAT_HWCAP_ISA_V
is starting to be introduced. Is that what we should be checking for?
Separately, #1017 mentioned that applying your patch helped them get further. It seems that the patch is only adding something for clang, whereas they are using GCC, except for the targets.cc change which causes them to not set the HWY_RVV
bit.
Something seems to be off about the way they disable V, but still: detecting V support at runtime would indeed help fix their problem, and the patch doesn't hurt in any way, right?