Sergey "Shnatsel" Davidoff

Results 942 comments of Sergey "Shnatsel" Davidoff

I've fixed the binary build workflow in https://github.com/rust-secure-code/cargo-auditable/pull/226 so the next release, probably v0.7.1, should have precompiled binaries again. Publishing backdated releases with cargo-dist is not supported and is very...

The 0.7.1 release with prebuilt executables is now up.

I also tried a 1:1 port of the SIMD filtering algorithm, but this is as far as I got before things started falling apart: https://godbolt.org/z/W49qeGv3o I got all the SIMD...

I got it to vectorize the comparisons too, so only the final value selection is still scalar: https://godbolt.org/z/TPdoWPPMd

But the direct translation of Portable SIMD code is still pretty gnarly and doesn't look any faster: https://godbolt.org/z/b7G3xnsj8

I've wired up my most successful attempt, let's see if it beats the autovectorization we already have in place: https://github.com/Shnatsel/image-png/tree/simder-paeth Not sure how to benchmark it though, the filters are...

Well, turns out the solution was right in front of me all along. The filtering code currently in use already vectorizes perfectly, resulting in code identical to the explicit SIMD...

Okay, I have a [branch](https://github.com/Shnatsel/image-png/tree/autovec-paeth) where I've simply replaced the handwritten SIMD implementations with autovectorized ones, and the results are pretty surprising. At least on my x86_64 machine, with no...

In this prototype, yes. However, I am confident that I will be able to convert the 3 and 6 bpp codepaths to use `[u8; 4]` loads instead of `std::simd` types...

Well, that confidence was misplaced. Autovectorization fails here, and for a really interesting reason. Here's a godbolt link to illustrate the perfect assembly we're looking for: https://godbolt.org/z/Wqoerq98T Here's the assembly...