highway icon indicating copy to clipboard operation
highway copied to clipboard

Implement runtime dispatch on riscv64

Open malaterre opened this issue 2 years ago • 38 comments

This is a ticket to track progress on runtime dispatch for riscv64 (followup to issue #818).

For now, unit tests are failing, see Debian/riscv64 buildd:

  • https://buildd.debian.org/status/fetch.php?pkg=highway&arch=riscv64&ver=0.17.1%7Egit20220707.b0108ff-1&stamp=1657192132&raw=0

malaterre avatar Jul 07 '22 11:07 malaterre

The CPU/emulator running the test likely doesn't support V extension 1.0 yet, but we are building with flags that allow the compiler to assume that the CPU does support it.

Would anyone like to send a patch to targets.cc (analogous to the AT_HWCAP there), to check whether the V extension is supported?

Independently of that, I can help fix the warnings from your build log, thanks for sharing it.

jan-wassenberg avatar Jul 07 '22 14:07 jan-wassenberg

This may take a bit of back-n-forth as Debian does not offer riscv64 porterbox:

  • https://wiki.debian.org/RISC-V#Porterboxes

So I kindly requested on debian-riscv mailing list to do it for me:

  • https://lists.debian.org/debian-riscv/2022/07/msg00020.html

will update the progress here.

malaterre avatar Jul 07 '22 14:07 malaterre

Sounds good, thanks!

jan-wassenberg avatar Jul 07 '22 15:07 jan-wassenberg

I was granted access to the following hardware (*), it seems that the instructions is not supported:

$ ctest -R TestAllMulHigh/RVV -V
[...]
147: Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV
147: [==========] Running 1 test from 1 test suite.
147: [----------] Global test environment set-up.
147: [----------] 1 test from HwyMulTestGroup/HwyMulTest
147: [ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV
1/1 Test #147: HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV  # GetParam() = 268435456 ...***Exception: Illegal  0.13 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.18 sec

The following tests FAILED:
	147 - HwyMulTestGroup/HwyMulTest.TestAllMulHigh/RVV  # GetParam() = 268435456 (ILLEGAL)
Errors while running CTest
Output from these tests are in: /home/malaterre/highway-0.17.1~git20220711.f0a396a/obj-riscv64-linux-gnu/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

(*)

$ cat /proc/cpuinfo 
processor	: 0
hart		: 2
isa		: rv64imafdc
mmu		: sv39
uarch		: sifive,bullet0

processor	: 1
hart		: 1
isa		: rv64imafdc
mmu		: sv39
uarch		: sifive,bullet0

processor	: 2
hart		: 3
isa		: rv64imafdc
mmu		: sv39
uarch		: sifive,bullet0

processor	: 3
hart		: 4
isa		: rv64imafdc
mmu		: sv39
uarch		: sifive,bullet0

For reference:

$ cat /usr/include/riscv64-linux-gnu/asm/hwcap.h
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
 * Copied from arch/arm64/include/asm/hwcap.h
 *
 * Copyright (C) 2012 ARM Ltd.
 * Copyright (C) 2017 SiFive
 */
#ifndef _ASM_RISCV_HWCAP_H
#define _ASM_RISCV_HWCAP_H

/*
 * Linux saves the floating-point registers according to the ISA Linux is
 * executing on, as opposed to the ISA the user program is compiled for.  This
 * is necessary for a handful of esoteric use cases: for example, userspace
 * threading libraries must be able to examine the actual machine state in
 * order to fully reconstruct the state of a thread.
 */
#define COMPAT_HWCAP_ISA_I	(1 << ('I' - 'A'))
#define COMPAT_HWCAP_ISA_M	(1 << ('M' - 'A'))
#define COMPAT_HWCAP_ISA_A	(1 << ('A' - 'A'))
#define COMPAT_HWCAP_ISA_F	(1 << ('F' - 'A'))
#define COMPAT_HWCAP_ISA_D	(1 << ('D' - 'A'))
#define COMPAT_HWCAP_ISA_C	(1 << ('C' - 'A'))

#endif /* _ASM_RISCV_HWCAP_H */

malaterre avatar Jul 14 '22 14:07 malaterre

That's right, we'd have to see v in the list of isa extensions. I see mention of Sifive bullet from 2020, whereas the V spec was ratified only a few months ago. We're testing using Spike or QEMU.

jan-wassenberg avatar Jul 15 '22 07:07 jan-wassenberg

Just for reference, current debian patch is:

  • https://salsa.debian.org/debian-phototools-team/highway/-/blob/debian/0.17.1_git20220711.f0a396a-1/debian/patches/riscv.patch

malaterre avatar Jul 16 '22 12:07 malaterre

Nice, I think the bit we want to test is 1 << ('v' - 'a').

jan-wassenberg avatar Jul 18 '22 07:07 jan-wassenberg

1 << ('v' - 'a')

Lower-case ? Anyway I'll integrate the patch as-is in next upload.

malaterre avatar Jul 18 '22 07:07 malaterre

@jan-wassenberg could you comment on my patch see above. it seems I am missing something:

[...]
[28/103] /usr/bin/clang++-14 -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wgnu-redeclared-enum -Winfinite-recursion -Wself-assign -Wstring-conversion -Wtautological-overlap-compare -Wthread-safety-analysis -Wundefined-func-template -fno-cxx-exceptions -fno-slp-vectorize -fno-vectorize -fdiagnostics-show-option -fcolor-diagnostics -Wc++2a-extensions -fmath-errno -fno-exceptions -march=rv64gcv1p0 -menable-experimental-extensions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/image/image.cc.o -c '/<<PKGBUILDDIR>>/hwy/contrib/image/image.cc'
[...]

thje above seems to puts v-extensions in shared logic:

  • https://github.com/google/highway/issues/856#issuecomment-1186174188

malaterre avatar Jul 18 '22 07:07 malaterre

Yes indeed, you could also write 'V' - 'A' (it is the same).

For the patch, we want to also do what was done for Arm (commit). This is a bit harder to see because the patch also included other required fixes, but the key parts we haven't yet added are:

  1. In detect_targets.h:405 #define HWY_ATTAINABLE_TARGETS should also be done if ARCH_RVV in addition to ARCH_ARM.

  2. In set_macros-inl.h we want to #define HWY_TARGET_STR to whatever the compiler requires. This is usually xx if the compiler flag were -mxx, but on RVV we previously used -march=rv64gcv1p0 so I'm not sure what the syntax is.

Actually I see in the LLVM headers that RVV currently has the same issue as NEON/SVE: they do not yet support runtime dispatch. An issue has been filed and discussions are ongoing at least for Arm.

It's possible that GCC already supports this for RVV like they do for NEON/SVE. If so, you would see in its riscv_vector.h some target attribute in the function definitions, and that is what we want to define HWY_TARGET_STR to.

jan-wassenberg avatar Jul 18 '22 08:07 jan-wassenberg

It's possible that GCC already supports this for RVV like they do for NEON/SVE. If so, you would see in its riscv_vector.h some target attribute in the function definitions, and that is what we want to define HWY_TARGET_STR to.

I cannot find a file riscv_vector.h in my gcc-12 install tree:

  • https://packages.debian.org/sid/riscv64/libgcc-12-dev/filelist

clang seems to offer it:

  • https://packages.debian.org/sid/riscv64/libclang-common-14-dev/filelist

But as you guessed the include file is messed up:

% head -20 /usr/lib/llvm-14/lib/clang/14.0.6/include/riscv_vector.h | tail -5

#ifndef __riscv_vector
#error "Vector intrinsics require the vector extension."
#endif

malaterre avatar Jul 18 '22 14:07 malaterre

For reference:

  • https://github.com/llvm/llvm-project/issues/56592

malaterre avatar Jul 18 '22 14:07 malaterre

Yes indeed, you could also write 'V' - 'A' (it is the same).

Right, silly me...

In any case I do not see this integrated upstream:

  • https://github.com/torvalds/linux/blob/v5.19-rc7/arch/riscv/include/asm/hwcap.h#L27-L35

where did you come up with the value ? should we report an issue in linux upstream (getauxval may need this definition) ?

malaterre avatar Jul 18 '22 14:07 malaterre

Nice, thanks for filing the LLVM issue. For the AT_* values, I simply extrapolated from the fact that RISC-V extensions are (or were mostly) identified with a one-letter name, and the convention seems to be a dense bit array in alphabetical order.

jan-wassenberg avatar Jul 18 '22 15:07 jan-wassenberg

For HWY_TARGET_STR, this comment suggests that arch=rv64gcv1p0 might be exactly what we want.

jan-wassenberg avatar Jul 20 '22 13:07 jan-wassenberg

Current WIP in Debian is at:

  • https://salsa.debian.org/debian-phototools-team/highway/-/blob/debian/experimental/debian/patches/riscv.patch

malaterre avatar Aug 05 '22 14:08 malaterre

Current gcc status:

  • https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1106#issuecomment-1233901350

See:

  • https://github.com/riscv-collab/riscv-gcc/blob/riscv-gcc-rvv-next/gcc/config/riscv/riscv-vector.h

malaterre avatar Sep 01 '22 08:09 malaterre

rdcycle is currently producing SIGILL on Debian env, so no progress on riscv64 for me until issue is solved:

  • https://lists.debian.org/debian-riscv/2022/08/msg00049.html

malaterre avatar Sep 01 '22 08:09 malaterre

Thanks for making us aware. This situation is regrettable: an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension, which is not yet ratified. I fail to see how it makes sense to ship a board without something as basic as a timer (nor how this was considered 'optional' in the spec), but both seem to have happened.

We will disable it for now and fall back to clock_gettime.

jan-wassenberg avatar Sep 01 '22 11:09 jan-wassenberg

an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension

wait, what ? SIGILL is produced on the same actual physical board. I fail to understand why an instruction would suddenly fail to execute depending on the running linux kernel.

malaterre avatar Sep 01 '22 11:09 malaterre

Are we sure about it being the same board? I saw this in the Debian discussion:

  • Hifive Unleashed running a 5.10.28 kernel
  • Polarfire Icicle running kernel 5.18.14-1 However it produces a SIGILL on
  • Hifive Unmatched running a 5.18.14-1 kernel
  • Hifive Unmatched running a 5.18.16-1 kernel

jan-wassenberg avatar Sep 01 '22 13:09 jan-wassenberg

Are we sure about it being the same board?

mailing list archive are hard to read from one month to the other :(

Here is my post from this morning:

  • https://lists.debian.org/debian-riscv/2022/09/msg00000.html

If you want the full summary, I also posted here:

  • https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE/m/gdKCQn-dBwAJ

Repeated here for clarity:

It works fine on

  • QEMU running a 5.18.16-1 kernel
  • Hifive Unleashed running a 5.10.28 kernel
  • Hifive Unmatched running a 5.16.14-1 kernel
  • Polarfire Icicle running kernel 5.18.14-1

However it produces a SIGILL on

  • Hifive Unmatched running a 5.18.14-1 kernel
  • Hifive Unmatched running a 5.18.16-1 kernel

malaterre avatar Sep 01 '22 13:09 malaterre

Thanks for making us aware. This situation is regrettable: an important feature (cycle counter or even timer) has been demoted from the base spec (where it was when I last checked) to an extension, which is not yet ratified. I fail to see how it makes sense to ship a board without something as basic as a timer (nor how this was considered 'optional' in the spec), but both seem to have happened.

We will disable it for now and fall back to clock_gettime.

Yes, moving things out of the base spec since 2.0 has created a mess, but this is not the issue. The issue is the Linux kernel deliberately disables access to the cycle counter in userspace since 5.18. It does permit access to the timer in userspace. Pre-5.18 the relevant configuration was left untouched so the cycle counter was enabled, but 5.18 landed a commit (https://github.com/torvalds/linux/commit/e9991434596f5373dfd75857b445eb92a9253c56) that added a perf driver, with the side-effect of blocking everything but the timer in userspace (even instret is disabled...), which should still work.

Also note checking for __riscv_zicntr isn't that good an idea; if you use an older toolchain, or any version of Clang, that doesn't exist because it's treated as part of RV64I, not because it thinks you shouldn't use the counters... (and they wouldn't accept -march=..._zicntr either).

jrtc27 avatar Sep 01 '22 17:09 jrtc27

For reference:

The breakage happened because the new pmu driver only enabled TM bit by default instead of all three. The change was intentional due to security reasons. One rogue process can have access to cycle & instructions of the entire kernel always which can lead to some sort of side channel attacks.

However, I agree that we can't break userspace. I was not aware of the fact that there are already users of rdcycle in the userspace. I will send a patch to restore the original behavior by enabling CY, IR bits in scounteren."

  • https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE/m/Lq7G6jXSBwAJ

malaterre avatar Sep 02 '22 06:09 malaterre

For reference:

  • https://github.com/google/highway/pull/961

malaterre avatar Sep 02 '22 07:09 malaterre

Thanks @malaterre for CCing me on the email, and thanks @jrtc27 for clearing up my misunderstanding. (The fact that Zicntr is an extension looks like a more theoretical problem, then.)

Let's discuss via email. @jrtc27 , would you also like to join that discussion?

jan-wassenberg avatar Sep 02 '22 08:09 jan-wassenberg

For the AT_* values, I simply extrapolated from the fact that RISC-V extensions are (or were mostly) identified with a one-letter name, and the convention seems to be a dense bit array in alphabetical order.

Seems like this is still not merged upstream:

  • https://www.spinics.net/lists/kernel/msg4354764.html

malaterre avatar Sep 06 '22 14:09 malaterre

An update here for the record, related work ongoing in #1017. @malaterre has updated our timer to use rdtime instead of rdcycle, though apparently newer Linux will once again allow rdcycle.

Do we want to integrate your patch already to allow runtime dispatch on riscv64 + clang?

jan-wassenberg avatar Oct 17 '22 10:10 jan-wassenberg

@jan-wassenberg there is no-point in trying to merge any of my code for runtime dispatch on riscv64, see above:

  • https://github.com/google/highway/issues/838#issuecomment-1238243830

we need to have getauxval (ideally) or at least minimal kernel support from user-space. So far only kernel mode can query the hardware support.

malaterre avatar Oct 17 '22 11:10 malaterre

Do I understand correctly that the issue is that we do have getauxval, but not the bit definitions macro for the V extension?

If so, it seems that the patch would remain correct in future when a macro is added, if we used 1 << ('V' - 'A') which is what the macro will almost certainly evaluate to? I see COMPAT_HWCAP_ISA_V is starting to be introduced. Is that what we should be checking for?

Separately, #1017 mentioned that applying your patch helped them get further. It seems that the patch is only adding something for clang, whereas they are using GCC, except for the targets.cc change which causes them to not set the HWY_RVV bit.

Something seems to be off about the way they disable V, but still: detecting V support at runtime would indeed help fix their problem, and the patch doesn't hurt in any way, right?

jan-wassenberg avatar Oct 17 '22 13:10 jan-wassenberg