Change dispatch breakpoint to XXH3_accumulate()
This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.
error: invoking macro XXH3_ACCUMULATE_TEMPLATE_3 argument 2: empty macro arguments are undefined in ISO C90 [-Werror=pedantic]
Wow, literally ~~1984~~ 1989.
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? 🤔
Also, apt seems to be 404ing.
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? 🤔
Indeed, empty variadic macros are disallowed in strict ANSI C90.
There are ways around that, but they are fairly complex.
Essentially, merge the variadic part with a fixed part,
then derive the nb of parameters in the new variadic macro,
and branch out differently depending on this total (typically 0 and non-zero).
There are blogs dedicated to this topic, such as this one for example, just be aware that it's a multi-stages process, and resulting implementation is far from obvious.
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking
Also, apt seems to be 404ing.
#define XXH_TARGET_DEFAULT __attribute__(())
How about this?
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking
Also, apt seems to be 404ing.
#define XXH_TARGET_DEFAULT __attribute__(())How about this?
Unfortunately not everything is GCC, that would be too easy 😅
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking Also, apt seems to be 404ing.
#define XXH_TARGET_DEFAULT __attribute__(())How about this?
Unfortunately not everything is GCC, that would be too easy sweat_smile
But it really works on my platform. I've tried GCC7, GCC8 & GCC9 on my x86 platform.
export CFLAGS="-std=c90 -pedantic -Wno-long-long -Werror"
export CC=gcc
make clean xxhsum
Unfortunately not everything is GCC, that would be too easy 😅
But it really works on my platform. I've tried GCC7, GCC8 & GCC9 on my x86 platform.

I mean that the __attribute__(()) would error MSVC as well as the other random ANSI C compilers out there that would otherwise be compatible.
Also, apt seems to be 404ing.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 8eadaec..b171a4b 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -214,6 +214,7 @@ jobs:
- name: apt-get install
run: |
+ sudo apt-get update
sudo apt-get install valgrind cppcheck
- name: Environment info
It could fix the 404 error.
https://github.com/Cyan4973/xxHash/runs/5572167241?check_suite_focus=true#step:10:320
Why did s390x break this time? More strict aliasing memes uncovered by inlining?
https://github.com/Cyan4973/xxHash/runs/5572167241?check_suite_focus=true#step:10:320
Why did s390x break this time? More strict aliasing memes uncovered by inlining?
I think key reason is in XXH3_digest_long().
@@ -5425,10 +5463,9 @@ XXH3_digest_long (XXH64_hash_t* acc,
secret, state->secretLimit,
XXH3_accumulate, XXH3_scrambleAcc);
/* last stripe */
- XXH3_accumulate(acc,
- state->buffer + state->bufferedSize - XXH_STRIPE_LEN,
- secret + state->secretLimit - XXH_SECRET_LASTACC_START,
- 1);
+ XXH3_accumulate_512(acc,
+ state->buffer + state->bufferedSize - XXH_STRIPE_LEN,
+ secret + state->secretLimit - XXH_SECRET_LASTACC_START);
} else { /* bufferedSize < XXH_STRIPE_LEN */
xxh_u8 lastStripe[XXH_STRIPE_LEN];
size_t const catchupSize = XXH_STRIPE_LEN - state->bufferedSize;
https://github.com/hzhuang1/xxHash/tree/new_accum1 In my branch, it could pass the build. It's just for reference.
Ah, good catch, I missed that one.
This still indicates that something is incorrect, though. I question the changes in 15ce80f9f2760609d8cc68cea76d3f3217ab70e1, as I'm not sure GCC likes the cast. (@ellert)
Should VSX just use memcpy everywhere? It seems like GCC is impossibly strict with aliasing rules.
(still doesn't fix the empty macro issue)
(still doesn't fix the empty macro issue)
My solution is to create another macro without empty parameter. https://github.com/hzhuang1/xxHash/commit/826ddf808e94b35fccbdfeb5ac926fe36df73e7f
(still doesn't fix the empty macro issue)
My solution is to create another macro without empty parameter. hzhuang1@826ddf8
@easyaspi314 How about this pull request? Do you like the way to drop empty parameter?
Not sure I would want to copy paste that .
This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.
Is this PR still active ?
I believe it's a good topic, though it seems to get into complexities.
Also, if it helps improve performance for x86 dispatcher, it would be great to measure it.
This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.
Is this PR still active ?
I believe it's a good topic, though it seems to get into complexities. Also, if it helps improve performance for
x86dispatcher, it would be great to measure it.
I tried to measure the performance on my x86 laptop. The result data is nearly same. Each data is an average value of 10 continuous samples.
| 512B | 1KB | 2KB | 4KB | 8KB | 16KB | 32KB | 64KB | 128KB | 256KB | 512KB | 1MB | 2MB | 4MB | 8MB | 16MB | 32MB | 64MB | 128MB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| base | 15782.8 | 18045.3 | 18688.8 | 19017.5 | 19098.4 | 19360.5 | 19482.2 | 19568.4 | 19368.6 | 19590.8 | 19449.8 | 19584.3 | 19496.7 | 16841 | 14967.2 | 14904 | 14829 | 14821.2 | 14810.3 |
| patch | 15861.4 | 18074.9 | 18696.7 | 19149.9 | 19192.6 | 19420.6 | 19542.2 | 19596.6 | 19617.1 | 19566.8 | 19636.3 | 19667.8 | 19585.6 | 16922.1 | 14966.1 | 14848.5 | 14807.2 | 14869.5 | 14874.9 |
The patch only changes the interface. And it doesn't touch any key code, such as accessing less memory. So I think we need more patches to improve the performance on x86 like SVE code.
I think the patch is blocked on the empty macro parameters. What's our plan on it?
I think the patch is blocked on the empty macro parameters. What's our plan on it?
@easyaspi314 @Cyan4973 Could you help to share the idea on empty macro parameters? Obviously, it's not supported by C90.
I think the patch is blocked on the empty macro parameters. What's our plan on it?
Time flies fast, sorry.
It seems I initially misinterpreted the issue. There is no variadic macro here, it's an empty argument in a macro with fixed nb of arguments.
Unfortunately, the C90 requirement is pretty strict. An optional argument in a macro is not allowed. That requires a solution.
And btw, the solution already exists, as part of @hzhuang1 's PR #713 ,
where the XXH3_ACCUMULATE_TEMPLATE_2(sse2, XXH_TARGET_SSE2) with optionally empty XXH_TARGET_SSE2 is replaced by a more complex but also more C90 correct :
# ifdef XXH_TARGET_SSE2
XXH3_ACCUMULATE_TEMPLATE_2(sse2, XXH_TARGET_SSE2)
# else
XXH3_ACCUMULATE_DEF_TEMPLATE(sse2)
# define XXH_TARGET_SSE2 XXH_TARGET_DEFAULT
# endif
(with the requirement that XXH_TARGET_SSE2 must not be defined, not even to NULL, before that point).
So that looks like the available solution to me. And it already passes CI tests.
@easyaspi314 Will you refresh this pull request? If you don't have time, I can refresh it instead.
Sorry about the ghosting, I've been really busy with life and haven't been able to focus enough to code... Can you please update it?
Sorry about the ghosting, I've been really busy with life and haven't been able to focus enough to code... Can you please update it?
No problem. I'll handle it.
#756 is a new implementation of this pull request. Since #756 is merged, this one could be close now.