stb icon indicating copy to clipboard operation
stb copied to clipboard

Fix AVX and AVX2 corruption due to undefined upper 128 bits after cal…

Open popizdeh opened this issue 2 years ago • 6 comments
trafficstars

When stbir__1_coeff_remnant and stbir__store_output are expanded the following happens:

stbir__simdf8_madd_mem4( tot0, tot0, t, ); will corrupt upper 128 bits of tot0 because of _mm256_castps128_ps256 stbir__simdf8_add4halves( t, t, tot0 ); will use those bits via _mm256_extractf128_ps( tot0, 1 )

I was observing image corruption when scaling by 1/9 factor but it's very difficult to reproduce, I'm guessing this is because the compiler inserts VZEROUPPER at various places so upper 128 bits get set to zero most of the time but in some situations I get random values in the upper half of the register.

I haven't seen the bug when enabling FMA but I've applied the fix as the code suffers from the same issue. This goes for stbir__simdf8_add4 too, no problem was observed but that's the only other place where _mm256_castps128_ps256 is used.

popizdeh avatar Nov 17 '23 01:11 popizdeh

You'll need to give us an actual repro and compiler+OS, that fix as-is is not OK.

rygorous avatar Nov 17 '23 20:11 rygorous

For Jeff: top 128 bits with _mm256_castps128_ps256 are undefined. The VEX-encoded 128-bit loads actually zero-extend so on the instruction level it's fine and I don't how you would hit with this VEX codegen, but I could see this causing an issue with MSVC without /arch:AVX when the computation on the 128-bit values isn't VEX-encoded. (That's why I never do AVX code without /arch:AVX personally.)

The blend sequence in this original commit is bad because it generates poor code for current GCC and several recent versions of MSVC (although it got fixed a while back). https://godbolt.org/z/hcTrjGzKW

Just use _mm256_setr_m128(x, _mm_setzero_ps()) to do the 128-bit -> 256-bit expansion with zero-fill. The compilers I've tested codegen that one decently. Ideally define a macro or similar for just that sequence instead of open-coding it into every use. (For the original committer, please leave the PR open so we have it in lieu of a bug. Next time, please file bug reports instead of unsolicited PRs whenever possible.)

rygorous avatar Nov 17 '23 21:11 rygorous

I'm on MacOS 13.6 with Xcode 14.2, I'm compiling resize-corruption.cpp.zip with clang++ -std=c++20 resize-corruption.cpp -mavx2 -o repro, input image is 960x540, 8 bit, RGBA (I don't think this is relevant at all but I was debugging the issue thinking it was memory corruption initially). The bug only seems to happen in Debug mode, passing -O2 makes it go away, as does removing -mavx2.

As for the unsolicited PR, the comment makes me feel unwelcome. I had to fix this issue on my end as I don't know when you'll get around to it. If the fix is wrong or subpar I'm very interested in learning from it. There's also that saying, one pull request is worth a thousand bug reports :P

popizdeh avatar Nov 17 '23 22:11 popizdeh

I have stated elsewhere that it's not worth posting an Issue and a PR at the same time, so a bare PR is ok IMO; just make sure the PR includes enough information to serve as a bug report, and be willing to leave it open indefinitely until it's fixed.

We do have a general trend with PRs that they often attack problems the wrong way and/or don't account for the full breadth of platforms and compilers we support, and as a result few PRs are accepted as is, so I think it's more about not wasting your time. If you don't consider it a waste of your time, then whatever.

nothings avatar Nov 17 '23 22:11 nothings

That evidently came out harsher than intended!

I've had several issues in the past with PRs:

  1. they got closed by the original author for one reason or another before we got around to looking at them, meaning that it's easy to inadvertently lose bug reports that only exist in the form of a PR with a suggested fix
  2. they got inadvertently revised or otherwise changed which in certain cases auto-closes to PR on Github's side
  3. the PRs were large changes adding new features (or similar) outside the scope of the stb libraries (e.g. adding new image formats to stb_image); we will not merge these and I'd rather people ask first than spend several weekends on a multi-1000-line change that we were never going to take in the first place.
  4. the code change had one or more issues that made it unsuitable for merging in its present form, but it's a 4-line diff so iterating on it in PR form makes no sense, it's easier to just recreate than use a PR-based workflow for it in the first place.
  5. a bug that was thought to be fixed by a PR actually wasn't; it's easy to reopen an issue but PRs again make this harder and tie it to a specific change, which gets confusing fast especially when the PR was just closed and not merged in its suggested form (see 4.)

I'd much rather have bugs reported and tracked in the issue tracker which avoids these numerous workflow issues with PRs.

rygorous avatar Nov 17 '23 22:11 rygorous

Good catch Nik - that is REALLY hard to trigger, but yeah, that's a bug. Let me double check the other si128 casts, and I'll submit to Sean...

jeffatrad avatar Nov 18 '23 00:11 jeffatrad