zfs
zfs copied to clipboard
Remove unused Edon-R variants
This commit removes the edonr_byteorder.h file and all unused variants of Edon-R.
Signed-off-by: Tino Reichardt [email protected]
Motivation and Context
Simplify the code and generate little speedups also ;-)
FreeBSD freebsd 13.1-RELEASE with clang 13.0.0:
implementation 1k 4k 16k 64k 256k 1m 4m 16m
edonr-generic-master 1982 2330 2435 2444 2468 2476 2130 2138
edonr-generic-newone 2202 2567 2679 2657 2618 2430 2373 2461
Debian Linux 5.10.0-16-amd64 with gcc version 10.2.1:
implementation 1k 4k 16k 64k 256k 1m 4m 16m
edonr-generic-master 1898 2495 2650 2748 2778 2700 2737 2739
edonr-generic-newone 2144 2750 2946 2987 2973 2805 2856 2978
The size of the compiled edonr.o
with debugging went down from 574,128 to 111,672 bytes.
Description
How Has This Been Tested?
I have run the standard testing procedures on a test system... the CI seems also fine.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Performance enhancement (non-breaking change which improves efficiency)
- [x] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [x] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by
.
I think that ignoring these compiler warnigs may also a reason for memory corruption on i386 and maybe other small stack architectures. I will rework this patch the same way as done with blake3 by pre-allocating a per-cpu space.
Have you seen memory corruption on i386 with this, then?
I've seen plenty of issues with it panicking on i386, but none of them seemed to be from memory corruption so far...
Have you seen memory corruption on i386 with this, then?
I've seen plenty of issues with it panicking on i386, but none of them seemed to be from memory corruption so far...
I want it to fit into default stack size ... and remove the pragma then. Maybe this is some source of memory corruption, but I have not checked this in detail. The result for for overwriting stacks is mostly pure random.
When I get it into the stack space, the result is slower also :/
Just say "I dislike pragmas", if that's why you're doing it.
Checkstyle is fine, the benchmarks show the same values.... and I could minimize the stack size also a bit.
The -Wframe-larger-than=
pragma is needed for gcc: error: the frame size of 1356 bytes is larger than 1024 bytes
Compiling the same file with clang, the stack size is fine - I have no idea why.
I'd look at precisely what flags the kernel is passing to compile with make V=1
, but in particular, I don't think many versions of clang know what-Wframe-larger-than
is. (I see some discussion from 2021 about adding various stack usage warning limits, but that's newer than a number of distros' compiler versions, so I wouldn't imagine the kernel would be happy to rely on that working yet.)
I'd look at precisely what flags the kernel is passing to compile with
make V=1
, but in particular, I don't think many versions of clang know what-Wframe-larger-than
is. (I see some discussion from 2021 about adding various stack usage warning limits, but that's newer than a number of distros' compiler versions, so I wouldn't imagine the kernel would be happy to rely on that working yet.)
I am using gcc version 10.2.1
and clang version 11.0.1
currently. The -Wframe-larger-than
is supported and I used -Wframe-larger-than=1
for getting infos about the stack usage of edonr.c
(userspace, without ZFS headers).
Results:
# gcc 32bit seems stupid:
[mcmilk@latitude ~]$ gcc -c -Os -m32 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘Q512’:
edonr.c:217:1: warning: the frame size of 1328 bytes is larger than 256 bytes [-Wframe-larger-than=]
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 400 bytes is larger than 256 bytes [-Wframe-larger-than=]
[mcmilk@latitude ~]$ gcc -c -m32 -O0 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘Q512’:
edonr.c:217:1: warning: the frame size of 2128 bytes is larger than 256 bytes [-Wframe-larger-than=]
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 400 bytes is larger than 256 bytes [-Wframe-larger-than=]
[mcmilk@latitude ~]$ gcc -c -m32 -O1 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘Q512’:
edonr.c:217:1: warning: the frame size of 1304 bytes is larger than 256 bytes [-Wframe-larger-than=]
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 400 bytes is larger than 256 bytes [-Wframe-larger-than=]
[mcmilk@latitude ~]$ gcc -c -m32 -O2 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘Q512’:
edonr.c:217:1: warning: the frame size of 1304 bytes is larger than 256 bytes [-Wframe-larger-than=]
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 400 bytes is larger than 256 bytes [-Wframe-larger-than=]
[mcmilk@latitude ~]$ gcc -c -m32 -O3 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘Q512’:
edonr.c:217:1: warning: the frame size of 1304 bytes is larger than 256 bytes [-Wframe-larger-than=]
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 432 bytes is larger than 256 bytes [-Wframe-larger-than=]
# gcc 64bit ok:
[mcmilk@latitude ~]$ gcc -c -Os -m64 -W -Wframe-larger-than=256 edonr.c
edonr.c: In function ‘EdonRHash’:
edonr.c:325:1: warning: the frame size of 400 bytes is larger than 256 bytes [-Wframe-larger-than=]
# clang, all ok:
[mcmilk@latitude ~]$ clang -c -Os -m32 -W -Wframe-larger-than=256 edonr.c
edonr.c:121:1: warning: stack frame size of 432 bytes in function 'Q512' [-Wframe-larger-than=]
edonr.c:318:1: warning: stack frame size of 412 bytes in function 'EdonRHash' [-Wframe-larger-than=]
[mcmilk@latitude ~]$ clang -c -Os -m64 -W -Wframe-larger-than=256 edonr.c
edonr.c:318:1: warning: stack frame size of 424 bytes in function 'EdonRHash' [-Wframe-larger-than=]
Neat, so it believes in it.
It seems like there's no difference in configuration for that flag between compilers, so my only guess would be if Clang is doing a better job at stack optimization...
I'm curious how gcc is behaving so badly at it, in your pasted example for -m32. Maybe clang is playing loose with what's required to be on the stack? Dunno. Might also be interesting to compare with -O2, since I believe that's what the kernel defaults to...
Neat, so it believes in it.
It seems like there's no difference in configuration for that flag between compilers, so my only guess would be if Clang is doing a better job at stack optimization...
I'm curious how gcc is behaving so badly at it, in your pasted example for -m32. Maybe clang is playing loose with what's required to be on the stack? Dunno. Might also be interesting to compare with -O2, since I believe that's what the kernel defaults to...
The comment above got an update...
Each call within Q512() generates some used bytes on the stack at x32 (for gcc) ... when I remove some rounds in the end ... the stack gets under 1024 ... who cares ;-)
@rincebrain - could you make a Code Review?
I haven't touched much code... mainly removing the unused variants and some re-using of variables by introducing these ones: uint64_t z1, z2, z3, z4, z5, z6, z7, z8;
I'll try to get to actually looking it over in depth later this week, but when I tested it, it didn't demonstrate any obvious incorrectness on a 100 GB edonr dataset generated with the old code...
I'll also try and give it a spin on my BE systems just to be safe.
I rebased again, cause the unused define EDONR_VALID_HASHBITLEN
was left...
Sorry, it turned out to be a busy week (your primary home server dying does that), and I haven't had a chance to look at this.
I am, frankly, more likely to go fix something broken when I get time first, though...
- Rich Ercolani schrieb:
Sorry, it turned out to be a busy week (your primary home server dying does that), and I haven't had a chance to look at this.
No problem.
I am, frankly, more likely to go fix something broken when I get time first, though...
I am working on this.... but it isn't so easy.
Oh? You almost certainly have more state on this than me, but since you never answered why you invented your own complex system for it, I was going to just go copy the fletcher4 toggle and see what broke.
Oh? You almost certainly have more state on this than me, but since you never answered why you invented your own complex system for it, I was going to just go copy the fletcher4 toggle and see what broke.
I am using a skeleton now, first start is here: https://github.com/openzfs/zfs/compare/master...mcmilk:zfs:sha2-rework Edit: to answer your question why, because I wanted to have a check for validity of all methods outside of the kernel... thats why the switching between the impls was done this way... for the module load I will take the fletcher-4 as skeleton
@rincebrain - Sorry for rebasing this branch a little to often.
The patch limits the -Wframe-larger-than=
pragma to 32-bit architectures - should this be mentioned in the commit message?
Potentially.
As far as apologizing, I appreciate it, but please don't worry about it - I wasn't trying to suggest you were doing something wrong, plenty of people just push the whole thing over and over again, just explaining why I was waiting for it to settle down...
I'll try to remember to give this a once over in the next 24h, and a runthrough on my small menagerie of unusual systems.
Potentially.
As far as apologizing, I appreciate it, but please don't worry about it - I wasn't trying to suggest you were doing something wrong, plenty of people just push the whole thing over and over again, just explaining why I was waiting for it to settle down...
I'll try to remember to give this a once over in the next 24h, and a runthrough on my small menagerie of unusual systems.
I just rebased against master and updated one comment and some c styling issues. In detail: statements like these: x1+x2
got written like this x1 + x2
now.
PING
Sure, sorry, I kind of burnt out on working on OpenZFS for a bit so I took a break.
I've got this branch compiling on my sparc and x86en and will give it a runthrough tonight, then review it if nothing blows up sometime in the next couple days.
My concern, though, is that this seems like a nontrivial rework of something high-impact, and those benchmark improvements seem smaller than the amounts I've seen those benchmarks vary across module loads. So I'm worried about whether the improvements are worth it? I'm obviously not the final authority on anything, though.
Lest you think I forgot, it made my SPARC panic. Sadly, it somehow failed to log the trace for why, so I'm having to run it again to see if it's unrelated.
Lest you think I forgot, it made my SPARC panic. Sadly, it somehow failed to log the trace for why, so I'm having to run it again to see if it's unrelated.
I created a small testing tool: https://github.com/mcmilk/edonr-testing
The old vs new code got checked on SPARC64 and PPC64 ...
Edit - I forget some benchmarks ;) Edit2 - the benchmarks are here now: https://github.com/mcmilk/edonr-testing
Interesting, something is absolutely smashing the stack on my SPARC when I run the test suite, and panicking in random places all over Linux as a result.
Time to run it again on the vanilla tree and see if it keeps doing it.
Rebased to current master.
@behlendorf - this PR removes in the first place the unused 224, 256 and 384 bit variants of EdonR.
It also renames the quasi_exform512()
macro to Q512()
and adds these shared variables for all three macros: uint64_t z1, z2, z3, z4, z5, z6, z7, z8;
Before we had 8 extra variables per macro ... LS1_512()
LS2_512()
and QEF_512()
These changes do a little speed up on all platforms, but not the sparc64. There the 'more variables' code is a bit faster. What do you think about this code cleanup?
If it's not needed, cause nobody uses EdonR ... I could just close it anyway ;-)
I'm not Brian, but I at least use EdonR when I want nopwrite. :)
I apparently have been bad at replying directly in here, but the smashing on sparc64 had nothing to do with this PR, to be clear, it just...does that now, I guess. :/
My concern, which I don't think I posted already, sorry, is that while I don't think any of the cleanup is particularly controversial and you could probably convince something to check and agree that the changes don't change the behavior for any codepaths that are hit, if you were really motivated in that way, I just get worried about any sort of large-scale changes to anything core like checksums or compression (...I say with my fingers in many branches of both pies), and I'm wondering how much the cleanup and performance delta is worth any risk of changing it, even marginal. (Normally I'd also be worried about ever syncing with upstream, but I think we can all agree that's never changing again.)
I also tend to optimize toward avoiding risk when giving advice outside of what I do to my own systems, though, so perhaps I'm being overcautious.
@mcmilk I'm sure edonr is used and nobody would complain about a speed up! But I do share @rincebrain's concerns above about this being a non-trivial change. Creating a testing tool is great, but can you also do some some manual testing if you haven't already. I'm thinking something along the lines of creating a pool on one platform, filling it with some data, then importing the pool on other platforms and scrub it to verify those implementations. It seems like that should catch anything seriously broken.
@mcmilk I'm sure edonr is used and nobody would complain about a speed up! But I do share @rincebrain's concerns above about this being a non-trivial change. Creating a testing tool is great, but can you also do some some manual testing if you haven't already. I'm thinking something along the lines of creating a pool on one platform, filling it with some data, then importing the pool on other platforms and scrub it to verify those implementations. It seems like that should catch anything seriously broken.
I only have x86_64 arch VMs with full root access. Sth. like Arm, Aarch64, PPC and so on is only userspace testing. But I cold ask for a PPC64 VM @ https://openpower.ic.unicamp.br/minicloud/ ... maybe we could also get a buildbot VM there.
Has someone of you (@rincebrain @behlendorf) already tried to get a machine there?
My Raspberry Pi3 may also an option... I forgot. Will import with old zfs there... this should be a simple testcase... will do this in the weekend.
I've not, but I've heard it's straightforward and have suggested adding a CI bot with one.
I do have sparc64 and aarch64 laying around my home, and usually have a ppc64le VM but the host is off for maintenance atm.