Adam Stylinski comments

Results 111 comments of


                                            Adam Stylinski

Optimize the fletcher4 neon implementation

It's a little bit surprising that the superscalar implementations are winning on that many armv8 implementations. Does the NEON pipeline on these things only have 1 execution port or something?...

Optimize the fletcher4 neon implementation

At least from my experience with it on zlib-ng with the adler checksum, native neon is for sure faster with m1 and many SIPs based on the cortex family. A...

Optimize the fletcher4 neon implementation

So it's probably a different story for little halves of arm but in my experience, it's really really hard to get discernible benefits from prefetch hints. Most of the time...

Optimize the fletcher4 neon implementation

> We can test that. I await with baited breath, heh. What would surprise me even more is the fact that you're prefetching with a sequential pattern. No doubt there...

Use GCC's may_alias attribute for unaligned memory access

Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls...

2.1.2 seems to fail tests on ppc64le with musl libc

ppc64le is a bit special in that VSX needs to perform a flip on loads because the memory load / store on certain sizes are natively big endian. Most of...

2.1.2 seems to fail tests on ppc64le with musl libc

Yes, I think the VMX issue might be my fault, I didn't have a little endian system to test it with. The adler checksum actually doesn't care about the order...

Benchmark: zlib-ng vs isa-l, zlib, libdeflate, brotli

I assume memcpy is just raw memory bandwidth with no compression? Both libdeflate and igzip have the advantage/disadvantage of not being forks from zlib but ground up implementations (on the...

2.1.2 seems to fail tests on ppc64le with musl libc

Yeah I'm almost certain this is what we're seeing: https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8 I'm fairly confident we can fix this by conditionally adjusting the mixing matrix based on endianness at compile time. It'd...

2.1.2 seems to fail tests on ppc64le with musl libc

@mtl1979 found a fix for adler_vmx, that is a separate issue from this one. He's investigating the POWER9 issue now. Looks like #1518 fixes both of these issues.