Work around incompatibility of aes-gcm-avx512 with ancient GNU as (GNU binutils)
This is just like issue #2463, except for AVX-512 version of the code in PR #2444, instead of the AVX2 version that already landed.
There are two pending changes from Eric Biggers that aren't yet merged upstream:
- https://boringssl-review.googlesource.com/c/boringssl/+/77168
- https://boringssl-review.googlesource.com/c/boringssl/+/77167
I don't know if those will land in BoringSSL, or when, or if they would help in the creation of a workaround or not.
PR #2444 is currently blocked on testing issues; I can't get QEMU to report the AVX-512 features are available, even in QEMU 9.2.2. Once I get those sorted out then PR #2444 will land.
It probably doesn't make sense to make a workaround for issue #2463 without (later, separately) making a workaround for the AVX-512 version, since the build would probably break again with the AVX-512 version incorporated.
So, basically, there is a potential race between shipping a release with the AVX-512 version included, and creating a workaround to keep it working with the ancient binutils.
Also note that the code for the AVX-512 version is in aes-gcm-avx10-x86_64.pl, and that aes-gcm-avx10-x86_64.pl already exists in the main branch; just the Rust code isn't wired up to use it, until PR #2444 lands.
In PR #2471 there is a very nice solution by @arielb1 that can easily be generalized to also handle the ZMM case.
In the future, there are likely to be additional files that make use of vaes and/or vpclmul instructions. Rather than copy/pasting the logic from PR #2471 into each file, we should try to put it in a single place.
The natural place to put it is x86_64-xlate.pl because then it would work for all files automatically, and it would be easy to extend to other instruction sets in the future. @arielb1 expressed some reservations about moving the logic there, but I'm not quite sure I'm understanding why it would be bad.
In the discussion of PR #2471, @arielb1 suggested we could create a separate script that is like x86_64-xlate.pl that is used just by the files that need this post-processing. That makes sense, however I know of some orgs who have custom build systems for building ring where introducing a new perlasm script dependency on a file would be a breaking change for them, so I think it would be better to avoid that.
@arielb1 expressed some reservations about moving the logic there, but I'm not quite sure I'm understanding why it would be bad.
Just a fairly complicated file that I don't like to have a big diff from upstream to. Maybe it's not as bad as it sounds.
I'll try doing it tomorrow, but I still find it hard to convince myself that I won't be introducing unexpected changes.
When we adapt what we did for avx2 to avx512, maybe we should also preserve the whitespace preceding the opcode on the line, prefixing the .byte directives with the same whitespace.
x86_64-xlate.pl will un-indent .byte directives it sees when translating to gas syntax, and it will convert them to DB and indent them when translating to nasm syntax. So, it seems like it doesn't matter whether we indent them or not, but indenting them would be more consistent with hand-written .byte directives.
I also noticed that aes-gcm-avx*-x86_64.pl use spaces for indention whereas most other source files uses tabs. This is another thing that x86_64-xplate.pl handles for us; it converts the spaces used for indentation into tabs.
So, definitely we do have to be careful, if/when modifying x86_64-xlate.pl, that we do our translation early enough in the pipeline so these other transformations are done afterware.
Well....I mentioned the changes we made for the AVX2 version to the BoringSSL developers and they immediately started removing all of the analogous encoding hacks from BoringSSL about 2 minutes later. And in fact one of them was already merged to the main branch:
- https://boringssl.googlesource.com/boringssl/+/43c279f17c03a44ee97ab5cd3e9e129e3e00609e
- https://boringssl.googlesource.com/boringssl/+/9bed47613c2cc5a42ea060ef5b69ee174571206e
IDK why they did that so immediately but I think we should rethink what we're doing here.
So what is their opinion about what we are doing here in ring?
I obviously believe that encoding hacks should be removed once they are no longer relevant - and similarly this hack should be removed when AL2 and its friends dies.
I think that the way I did things involving patching a single perl file with obvious search-and-replace is much easier to reason about than their more general functionality, or us patching x86_64-xlate.pl. Which is why I believe the best thing to do for avx-512 would be to copy my code to the other file, but I'm fine with putting it in the xlate file if you believe that's better.
I think with the avx512 code, we'll need to be careful about the difference between avx10-256, avx10-512, and avx512. Will they have the same mnemonics but assemble to different encodings?
FWIW, this workaround doesn't just impact Amazon Linux 2 but also probably RHEL/CentOS 7 and other things.
I do agree that the way we've implemented our version is about the safest we can do it while avoiding merge conflicts with merges from upstream.
As I mentioned in my latest comment in the pull request, I did verify that the AVX2 version of the workaround does result in the same .o/.obj files for Linux/Windows, before/after your changes, modulo the symbol prefixing changing from 0_17_13 to _0_17_14 as expected due to my Cargo.toml changes.
I also checked that there are no differences in the .o files, tho I only checked on Linux.
I believe that I can maintain the same .o files using the same method for the avx512 as well. I'll probably have a go at that tomorrow on top of your branch.
FWIW, this workaround doesn't just impact Amazon Linux 2 but also probably RHEL/CentOS 7 and other things.
For some reason I thought that version is dead outside of Amazon. Tho maybe not.
BTW, I also agree with you now that it is OK to put the workaround in the aes-gcm-avx10-* file instead of x86_64-xlate.pl. I think as part of that we need to also add some comments to the commented-out parts of the file that talk about avx10-256 (and avx10-512?) and note that the workaround was designed for avx512 and we're not sure if it needs updating if/when enabling the other variants.
I'll take a look at your PR. I get that the version that lands will be of the same shape [the AVX-512 code will be in crypto/fipsmodule/aes/asm/aes-gcm-avx10-x86_64.pl and that file is not supposed to really change between your current branch and what lands?]
I will merge https://boringssl.googlesource.com/boringssl/+/46683a56625c60319d45370f74930a5d15a000d4 and https://boringssl.googlesource.com/boringssl/+/b803ed04706c772335abce3fb8b2c1cc43ca2cd4 into main and rebase my PR on top of those, but I expect it will be otherwise the same.
since aes-gcm-avx10-x86_64.pl is already in the tree, you can submit your PR even before I do anything.
Will take a look tomorrow.
Well....I mentioned the changes we made for the AVX2 version to the BoringSSL developers and they immediately started removing all of the analogous encoding hacks from BoringSSL about 2 minutes later. And in fact one of them was already merged to the main branch:
https://boringssl.googlesource.com/boringssl/+/43c279f17c03a44ee97ab5cd3e9e129e3e00609e mentions it just cleaned up even older encoding hacks, from binutils versions older than 2.24. Do you actually need to support versions that old? It looks like the reported issues against ring were all for binutils 2.27 through 2.29, which suggests that an encoding hack would only be needed for VAES and VPCLMULQDQ.
I think with the avx512 code, we'll need to be careful about the difference between avx10-256, avx10-512, and avx512. Will they have the same mnemonics but assemble to different encodings?
While certain instructions (those that operate on xmm0-xmm15 or ymm0-ymm15) do have a choice of VEX or EVEX encoding, both encodings are valid to use in aes-gcm-avx10-x86_64.pl due to the CPU feature dependencies of that file. Assemblers do prefer VEX though, and using the same solution as https://github.com/briansmith/ring/pull/2471 you can match that behavior to produce the exact same object file.
I intend to merge all the changes from BoringSSL upstream to remove the Perl instruction encoders. I think the way we did it for VAES+VPCLMUL is less risky than what those other encoders are doing. I also hope we can get rid of the hand-encoded mulx and .byte 0x66 and .byte 0x67 and other stuff, both in BoringSSL and in ring.
I don't think we need to support anything older than 2.26 or 2.27 and maybe not even that. People struggle with this project even using binutils in the first place so the less they have to deal with it, the better, I guess.
I got some more info on why our workaround could be potentially more hazardous than I expected; see https://github.com/openssl/openssl/issues/27049#issuecomment-2722605271, which I'll quote here:
Since this would block -Wa,-msse2avx when building for x86-64-v3 baseline, I think this could also affect future APX baseline builds.
Why? In the ring workaround, we only convert the instructions that use YMM and ZMM to
.byteencoding; we don't touch the ones that use XMM registers. So I would expectmsse2avxto be a no-op for this? Unless it is also removingvzeroupper, which would be extremely dangerous for it to do.
In general, we need to see what postprocessing steps are happening over the assembly and which ones are broken by .byte encoding instructions.
I also see now that GNU as has a syntax for teaching it new instructions; see https://sourceware.org/binutils/docs/as.html#index-insn-directive. I wonder if that could be made to work.
I also see now that GNU as has a syntax for teaching it new instructions; see https://sourceware.org/binutils/docs/as.html#index-insn-directive. I wonder if that could be made to work.
https://sourceware.org/binutils/docs-2.40/as/i386_002dDirectives.html doesn't have it; it seems it was added in 2.41: https://sourceware.org/binutils/docs-2.41/as/i386_002dDirectives.html. Indeed, the release annoucement for 2.41 mentions it: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html
maybe we can have the build script set an ASSEMBLER_IS_OLD parameter/env var/whatever if the assembler is old?
if the assembler is new, we can use the "objdump-checking" method should make sure we don't introduce any changes in the encoding. Tho this will cause build failures if the encoding ever changes.