easyaspi314 comments

Results 132 comments of


                                            easyaspi314

Support SVE with assembly implementation

By the way I did some research on the A64FX, and from what it appears, NEON has been *severely* performance-deprecated (as in everything but the trivial instructions having 6-12 cycles...

Fix a typo that cause inconsistent hash between streaming and stateless way for XXH3 128-bit variant with custom secret and seed 0

- Just fix it. - Fix it. But also change major version to indicate breaking change / incompatibility in semver way. - We're still in 0.x though. - Don't fix...

Support SVE with assembly implementation

I think that for now we should only do SVE-512. Looking at the optimization guide, c7g is a tradeoff because while SVE can process 2x the data, NEON always has...

Support SVE with assembly implementation

> Yes, I agree on it. SVE don't improve a lot performance on SVE-128 & SVE-256. > > On SVE-256 (V1 core), I tried to tune assembly code. The latest...

Support SVE with assembly implementation

Yes, and the reason it is favorable is that instead of requiring the `uzp1/uzp2` setup, it can be done with `rev64`. The complicated shuffle is what makes NEON less efficient...

Support SVE with assembly implementation

Ah, you are confused because the uzp trick is for two vectors at once. This is for only one. Come to think of it this would actually have literally zero...

Support SVE with assembly implementation

That difference might solely be from it being handwritten assembly. However, even if it wasn't, I'd say that even if it is interleaved with scalar it clearly isn't going to...

Support SVE with assembly implementation

I'd say yes, although I would recommend the following priority: 1. C intrinsics if possible — The limitation to SVE512 or larger can probably improve performance due to fewer checks...

i686 gcc 12: regression at -O1 or -O2

I'll investigate. It is very much possible that this is due to MMX.

i686 gcc 12: regression at -O1 or -O2

Ok, this is not related to MMX. Doing some tests, it seems that this is a GCC bug specific to GCC 12 that has been fixed in GCC 12.2.1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322...