android-riscv64
android-riscv64 copied to clipboard
what's the ifunc story? hwcap.h
how are we going to do ifuncs if hwcap is limited/lagging?
there's still not even V in https://github.com/torvalds/linux/blob/master/arch/riscv/include/uapi/asm/hwcap.h and what does B look like in a world where it's been split into Zb[a-z]? those specific cases won't matter, because they'll be in the base Android riscv64 ABI, but if major mainstream extensions like that aren't getting the software support to enable ifuncs, what will the long tail look like?
See https://lore.kernel.org/all/[email protected]/ for the current proposed interface here
interesting... is that in addition to HWCAPs for the common stuff like "V", or is the syscall meant to be the only option?
otherwise i imagine -- mentally extending the glibc example there to all the string/memory routines (like https://android.googlesource.com/platform/bionic/+/master/libc/arch-arm64/dynamic_function_dispatch.cpp) -- that each libc ends up reinventing its own HWCAPs so we can just pay for the syscall once and then pass that to each ifunc (somewhere like https://android.googlesource.com/platform/bionic/+/master/libc/bionic/bionic_call_ifunc_resolver.cpp#39).
which is perhaps fine?
trying to imagine myself on the app developer side of the fence, having the myriad confusing options boiled down to "this is the subset that's actually common" is maybe a value-add from the OSes?
the kernel does seem to have admitted defeat here, with this in the 6.3 kernel's uapi header:
/*
* Provides information on the availiable set of ISA extensions to userspace,
* via a bitmap that coorespends to each single-letter ISA extension. This is
* essentially defunct, but will remain for compatibility with userspace.
*/
#define ELF_HWCAP (elf_hwcap & ((1UL << RISCV_ISA_EXT_BASE) - 1))
and then https://elixir.bootlin.com/linux/v6.2.1/source/arch/riscv/kernel/cpufeature.c#L134 parses the strings into an internal private equivalent of what a useful hwcap would be :-(
For single letter extensions like v, hwcap can detect that without extra syscall, but not so lucky for other extensions.
The problem of (traditional) hwcap is that only lower 32 bit can be used if we consider RV32 compatibility, and even count hwcap2 we only have 64 bits, but number of RISC-V extension is ... kind of huge.
And actually it already not enough: hwcap has reserved for single letter extensions, so we left 32 bit for hwcap2, however if we just count crypto scalar, bit manipulation, and vector extensions, it already exceed 32 extensions, and ... there are also more extensions not count here yet.
- Crypto Scalar has 13 sub-extensions
- Bit-Manipulation has 4 sub-extensions
- Vector extension has 6 sub-extensions
- Vector length (zvl*b) has 12 sub-extensions
So the only way for the full hwcap discover/probe is going to syscall...
@paul-walmsley-sifive already point out the kernel part, and here is the glibc part for the interface: https://patchwork.ozlabs.org/project/glibc/list/?series=343050
i still don't get how this isn't strictly worse than just having hwcap/hwcap2/hwcap3 etc? for me on the userspace side i don't feel like this solves any of my problems. (and it makes ifunc resolvers a lot more expensive unless i basically reinvent hwcap/hwcap2/hwcap3 on my side.)
do you have a link to the kernel side of the vdso part? how does that work?
(my other latent, orthogonal, fear is that we're also going to have a situation like the bad old days of arm32 where you can't actually trust what the kernel says anyway, and end up basically using a SIGILL handler to check yourself... but i don't think there's anything i can do about that, short of persuade you that you need an equivalent of https://www.felixcloutier.com/x86/cpuid for rv64!)
it's possible it won't really matter if we can get a decent set of extensions into the base Android/riscv64 ABI. i don't plan to ifunc the V string/memory functions, for example, because we'll just require V. similar for B. the interesting/worrying part is after we ship and app developers have to worry about this. which, realistically they mostly won't... they'll just live with the base ABI for a decade, as they mostly did with arm/arm64/x86/x86-64. which is why the base ABI is my real concern.
but i would still like to have a workable option here, for those developers that do care. at the moment i'm wondering if we're going to end up in more of a Java situation, where -- similar to how they have whole different classes for different API levels -- C/C++ developers will have whole different .so files that they select at runtime? (both amortizing the cost of ifunc resolvers needing to make syscalls, but also allowing them to not just pick key routines, but recompile everything for the higher ABI. but it would be a rare developer who cares that much about performance -- given that they need to ship on the "lowest common denominator" device anyway -- that they'd go to all that trouble and, more importantly, extra testing cost. function multi-versioning via ifuncs still strikes me as the least-unlikely way to have app developers actually use stuff that's not in the base ABI.)
going back to
For single letter extensions like v, hwcap can detect that without extra syscall, but not so lucky for other extensions.
i still haven't understood why not? https://cs.android.com/android/platform/superproject/+/master:bionic/libc/kernel/uapi/asm-arm64/asm/hwcap.h seems like a counterexample? there's some small-scale "caps" in there, and that's working just fine :-) (they're also more than half way to needing a hwcap3, which is a minor abi-breaking PITA for us, so my only suggestion there would be to just plan ahead and start with 4 or 8 or something!)
an equivalent of https://www.felixcloutier.com/x86/cpuid for rv64!
We do have mimpid (II:3.1.4); SiFive does use it in our bootloaders. Not that it addresses your larger concerns.
yeah, i meant all the uses of cpuid apart from that one :-)
As an IFUNC-using RISC-V library developer, I'm really not concerned about this. Rivos's proposed hwprobe mechanism doesn't seem too punishing. It's a vDSO libcall at best and a syscall at worst, and Rivos's caching ideas seem like they could avoid the syscall in common cases, except scenarios like heterogeneous migration, where we probably have bigger fish to fry anyway.
I acknowledge I'm assuming that the vDSO libcall won't be much worse than getauxval; since the overhead of the current (getauxval-based) IFUNC resolver is already hundreds of cycles, I'm assuming the relative overhead increase won't be catastrophic. And, for what it's worth, the overhead is only incurred once per static libcall.
So my suggestion is to lean into RVA23 to get the best "base profile" for Android and to avoid runtime discovery as much as possible, and to help shepherd hwprobe into the kernel to make runtime discovery less painful when it's necessary.
It's a vDSO libcall at best
does anyone have a link to that? i get how getpid() or whatever works, but i'm struggling a bit to see how this syscall could be in vdso?
since the overhead of the current (getauxval-based) IFUNC resolver is already hundreds of cycles
you shouldn't be paying for a getauxval() call though --- that should happen once in the dynamic loader. so you're not even calling into libc, let alone making a syscall. (and even if rv64 syscalls aren't expensive today, the last decade suggests they'll get more expensive over time as you have more security bugs to work around!)
So my suggestion is to lean into RVA23 to get the best "base profile" for Android and to avoid runtime discovery as much as possible
indeed. if anyone's ever wondered why i'm constantly banging on about this (and even in this thread talking about requiring V in bionic) ... this is why :-)
i'm struggling a bit to see how this syscall could be in vdso?
I'm afraid I'm not communicating clearly. This is the patch I'm trying to describe:
https://lore.kernel.org/lkml/[email protected]/
I haven't actually confirmed that the patch does what the author claims, I'm just going off of the description.
you shouldn't be paying for a getauxval() call though --- that should happen once in the dynamic loader.
Agreed --- I think? --- this is what I meant by "once per static libcall".
I'm afraid I'm not communicating clearly. This is the patch I'm trying to describe:
ah, that helps a bit. so the vdso basically ignores the heterogenous cores case (which seems unlikely to be a usable configuration for Android anyway), and i'd misunderstood the syscall --- it effectively is the "arbitrary number of hwcaps" solution, just in an array rather than in the ELF auxval.
that still seems like a bit of a mistake to me, because every process will need to query these anyway, so why not just include them? (i'm guessing they're optimizing for the heterogenous case?) but, yeah, this is probably fine.
part of my confusion is that the various pieces that have been mentioned on this bug don't actually match. the syscall implementation (https://lore.kernel.org/all/[email protected]/) linked much earlier said:
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
which is why i couldn't see how you could have a vdso for that. but if this is actually just being canonicalized to effectively just "hwcap(\d?) via a different mechanism" (which is what the vdso's array seems to be), it's a lot more obvious how this would work.
and it looks like the kernel will just give you what fits in your array if your userspace code is older than your kernel, so app compat seems fine. (given that an old app is unlikely to need hwcaps from the future!)
https://android-review.googlesource.com/c/platform/bionic/+/2684726 passes a nullptr first argument to riscv64 ifunc resolvers to allow us to pass something more interesting in future... personally i like the idea of passing a struct riscv_hwprobe* and a size_t saying how many keys are in it (and everyone agreeing that they should be in the same order, or callees can assume rather than search!), but i haven't done that yet because it's not clear to me that that's super useful without other libcs doing the same...