WIP: Implement OpenSSL's AES GCM acceleration for armv8
This is WIP still (but it works), and if you agree I'd like to split this PR up into smaller PRs next before merging:
- [ ] Reenable
kfpu_allowed()for newer kernels to make use of NEON registers - [ ] Add the single-block AES implementation and key expansion from OpenSSL
- [ ] Restructure multi-block AVX AES GCM codepath slightly to allow for sharing code with ARM codepath
- [ ] (This PR): Implement OpenSSL's AES-GCM acceleration
Important TODO (putting this high up here so I don't forget):
- [ ] Either fix or drop 128bit and 192bit kernels, they are untouched currently and will break when used.
I'll post some benchmarks on #12171 next.
Motivation and Context
Encrypted ZFS is currently very slow on aarch64 devices. This due to missing hardware acceleration support: #12171 Also the use of NEON is disabled on newer kernels: #14555
Description
Assembly generated from these files in the OpenSSL source:
-
crypto/modes/asm/ghashv8-armx.pl
- Diff: https://gist.github.com/w0xel/21fceb89826f9cc8bca47b1bfc87e4bd
-
crypto/modes/asm/aes-gcm-armv8_64.pl
- Diff: https://gist.github.com/w0xel/0417fabff8d24bcea95964b71f0ff2e9
-
crypto/aes/asm/aesv8-armx.pl
- Diff: https://gist.github.com/w0xel/fa9d4d5cd11883a17395b731d169257f
(ASM diffs at time of creation, will try to update those)
with slight modifications to support the ICP interface.
For the chunked AES-GCM codepath the AVX code was reused by renaming avx -> hardware.
- This is a lot of changes right now, but I've preferred this over copy-pasting the code, as it was resuable basically 1-to-1
- If you have a better idea of how to abstract/structure this let me know :)
Kernel FPU support was re-added by naively storing/restoring vector registers.
- This is basically the same as the x86_64 code, but a bit easier
- AFAIK armv8 has no alignment requirements here, am I wrong?
- We should IMHO disable this codepath on RT kernels. Same for x86.
And a bunch of support/wiring code around that.
How Has This Been Tested?
Definitely not well enough yet, do not merge yet.
The code was tested on a Raspberry Pi 5. I've created an encrypted fs, benchmarked it, swapped AES implementations, benchmarked it again and vice versa.
I did not test the changes on x86_64 yet, which is also affected by this PR.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [ ] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [ ] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain
Signed-off-by.
Some more info in this comment: https://github.com/openzfs/zfs/issues/12171#issuecomment-2391541274
awesome, this make encryption usable on rk3588 :-)
In a first view it looks really good. I will review it deeper on one or two weeks. Thanks a lot for this PR.
It would be clearer if you could split into separate commits, e.g.:
- one that implements & enables kfpu on aarch64 (so that current neon kernels for hashing and encryption would work again)
- one that port some kernel files from openssl project (that would be ~8000 LOC)
- one that enables these kernels and adds the dispatching
Edit: Sorry I missed your description that says you would split PRs, which exactly matched my thought. Please do!
Hi w0xel, any news on this PR ?