Adam Stylinski

Results 111 comments of Adam Stylinski

It's a little bit surprising that the superscalar implementations are winning on that many armv8 implementations. Does the NEON pipeline on these things only have 1 execution port or something?...

At least from my experience with it on zlib-ng with the adler checksum, native neon is for sure faster with m1 and many SIPs based on the cortex family. A...

So it's probably a different story for little halves of arm but in my experience, it's really really hard to get discernible benefits from prefetch hints. Most of the time...

> We can test that. I await with baited breath, heh. What would surprise me even more is the fact that you're prefetching with a sequential pattern. No doubt there...

Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls...

ppc64le is a bit special in that VSX needs to perform a flip on loads because the memory load / store on certain sizes are natively big endian. Most of...

Yes, I think the VMX issue might be my fault, I didn't have a little endian system to test it with. The adler checksum actually doesn't care about the order...

I assume memcpy is just raw memory bandwidth with no compression? Both libdeflate and igzip have the advantage/disadvantage of not being forks from zlib but ground up implementations (on the...

Yeah I'm almost certain this is what we're seeing: https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8 I'm fairly confident we can fix this by conditionally adjusting the mixing matrix based on endianness at compile time. It'd...

@mtl1979 found a fix for adler_vmx, that is a separate issue from this one. He's investigating the POWER9 issue now. Looks like #1518 fixes both of these issues.