zstd
zstd copied to clipboard
Use proper unaligned access attributes
Instead of using packed attribute hack, just use aligned attribute. It is supported back to at least GCC 3, and __declspec(align) is apparently supported by all MSVC versions. GCC generates identical code to regular aligned access on ARMv6 for all versions between 4.5 and trunk, except GCC 5 which is buggy and generates the same (bad) code as packed access: https://gcc.godbolt.org/z/hq37rz7sb
Also enable unaligned memory access in kernel using proper macros.
One problem with this approach is that some users may have already set MEM_FORCE_MEMORY_ACCESS=0 based on the original (wrong) implementation, and now with the correct implementation they will be using unnecessarily slow access. Possibly the macro should have new name to reflect new implementation, or it should just be automatic with no macro.
hm, i think this doesn't quite work on msvc. gcc "guarantees" (mumblemumble) that __attribute__((aligned(1))) will work as expected: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69502, but msvc doesn't work: https://gcc.godbolt.org/z/EMYn9s6b3. fortunately, memcpy generates correct and fast code, so i think it can be used for msvc. maybe someone can test msvc before 2015, since it's not available on compiler explorer.
oh, also for icc, i don't know why this was implemented in the first place since icc only supports intel x86 processors anyways.
What is the motivation for this patch? I'm not against merging it, just wondering why you want it. Does this measurably improve performance on some compiler/architecture? Does this fix build errors?
Also enable unaligned memory access in kernel using proper macros.
This is actually a no-op, since the kernel uses its own implementation of mem.h located here. I'm fine removing it, but please put it in its own commit.
Well, originally it was to fix armv6, but that was already "fixed" by #2633. It should still help performance, but I guess it is not that important, since packed structure access is slow on gcc, but only for armv6: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55218.
also I think #2633 left legacy broken?
Thanks for the PR @Hello71!
Could you please separate it into two PRs:
- Apply the fix from #2633 to legacy. I can merge this as soon as you put it up and the tests pass.
- The aligned attribute change. I'd like to wait until after our next release (which we're preparing for next week) to merge this. Since I want to give it some time to bake. In case there are any strange performance / correctness regressions.
after extensive investigation (https://gcc.godbolt.org/z/8xYbczn8e), i think this change is important on armv7 gcc too, because gcc produces ok code for packed version on armv7, but terrible code for inlined packed version on armv7 with -O3
Thanks for the PR @Hello71! Sorry we forgot to merge it, but it will be part of the next release :)