croaring-rs
croaring-rs copied to clipboard
wip: trying to debug windows CI failures
@lemire, It does appear to reproduce with c only, building this file with msvc cl: see the "msvc c roaring" step of this job (which is set to continue on error for that step).
It's built with https://github.com/Dr-Emann/croaring-rs/blob/debug_win_ci/.github/workflows/rust.yml#L53-L60, based on the flags/environment rust is compiling the c code with
running Hardware support: 0x1 Going to or many -1073741795
(where -1073741795 is 0xC000001D (STATUS_ILLEGAL_INSTRUCTION)).
croaring_hardware_support() = 0x1 doesn't appear to indicate AVX512 is the problem.
The issues began (with I believe no meaningful changes in CRoaring/croaring-rs, this PR is right off master, with no changes to CRoaring/croaring-rs) sometime between 3 months ago and 2 weeks ago
I have no idea why we don't see any issues in CRoaring itself
@lemire I was able to save the built executable from github actions (download link), and running it on my windows machine does error, and I'm able to open it in windbg.
Here's the assembly of (the first part of) the function crashing
roaring_fast_or!avx2_harley_seal_popcount256:
00007ff7`fc9595e0 488bc4 mov rax, rsp
00007ff7`fc9595e3 55 push rbp
00007ff7`fc9595e4 4881ec60010000 sub rsp, 160h
00007ff7`fc9595eb 0f2970e8 movaps xmmword ptr [rax-18h], xmm6
00007ff7`fc9595ef 0f2978d8 movaps xmmword ptr [rax-28h], xmm7
00007ff7`fc9595f3 440f2940c8 movaps xmmword ptr [rax-38h], xmm8
00007ff7`fc9595f8 440f2948b8 movaps xmmword ptr [rax-48h], xmm9
00007ff7`fc9595fd 440f2950a8 movaps xmmword ptr [rax-58h], xmm10
00007ff7`fc959602 440f295898 movaps xmmword ptr [rax-68h], xmm11
00007ff7`fc959607 440f296088 movaps xmmword ptr [rax-78h], xmm12
00007ff7`fc95960c 440f29a878ffffff movaps xmmword ptr [rax-88h], xmm13
00007ff7`fc959614 440f29b068ffffff movaps xmmword ptr [rax-98h], xmm14
00007ff7`fc95961c 440f29b858ffffff movaps xmmword ptr [rax-0A8h], xmm15
00007ff7`fc959624 488d6c2420 lea rbp, [rsp+20h]
00007ff7`fc959629 4883e5e0 and rbp, 0FFFFFFFFFFFFFFE0h
00007ff7`fc95962d 488bc2 mov rax, size (rdx)
00007ff7`fc959630 4c8bc9 mov r9, data (rcx)
00007ff7`fc959633 41b800000000 mov r8d, 0
00007ff7`fc959639 c5c1efff vpxor xmm7, xmm7, xmm7
00007ff7`fc95963d c44139efc0 vpxor xmm8, xmm8, xmm8
00007ff7`fc959642 62a15500efed vpxord xmm21, xmm21, xmm21 <- This instruction crashes
00007ff7`fc959648 c44131efc9 vpxor xmm9, xmm9, xmm9
00007ff7`fc95964d c5fe7f7d00 vmovdqu ymmword ptr [rbp], ymm7
00007ff7`fc959652 c44109eff6 vpxor xmm14, xmm14, xmm14
00007ff7`fc959657 c57e7f4520 vmovdqu ymmword ptr [rbp+20h], ymm8
Call stack:
[0x0] roaring_fast_or!avx2_harley_seal_popcount256+0x62 0xc576affba0 0x7ff7fc980e0c
[0x1] roaring_fast_or!container_repair_after_lazy+0x78 0xc576affd10 0x7ff7fc974b69
[0x2] roaring_fast_or!roaring_bitmap_repair_after_lazy+0xbc 0xc576affd10 0x7ff7fc974b69
[0x3] roaring_fast_or!roaring_bitmap_or_many+0x19 0xc576affd60 0x7ff7fc98b3b8
[0x4] roaring_fast_or!main+0xe9 0xc576affd60 0x7ff7fc98b3b8
Source code line according to debug info (may be wrong?):
inline static uint64_t avx2_harley_seal_popcount256(const __m256i *data,
const uint64_t size) {
__m256i total = _mm256_setzero_si256();
__m256i ones = _mm256_setzero_si256();
__m256i twos = _mm256_setzero_si256();
__m256i fours = _mm256_setzero_si256();
__m256i eights = _mm256_setzero_si256();
__m256i sixteens = _mm256_setzero_si256();
__m256i twosA, twosB, foursA, foursB, eightsA, eightsB;
const uint64_t limit = size - size % 16;
uint64_t i = 0; // <- This line is shown as where the crash occurs
for (; i < limit; i += 16) {
CSA(&twosA, &ones, ones, _mm256_lddqu_si256(data + i),
_mm256_lddqu_si256(data + i + 1));
CSA(&twosB, &ones, ones, _mm256_lddqu_si256(data + i + 2),
_mm256_lddqu_si256(data + i + 3));
Given no code changes between croaring/croaring-rs, I'm thinking it's the MSVC version bump from 17.7 to 17.8 which occurred between the last working test vs the first failing one.
This feels awfully relevant: https://developercommunity.visualstudio.com/t/Invalid-AVX512-instructions-generated-wh/10521872
After upgrading to MSVC 17.8, one some machines the AVX2 SIMD would start failing with illegal instruction faults. After some debugging, it seems MSVC embeds AVX512 instructions in the AVX2 compiled code.
Also relevant, looks like if MSVC figures out we're going to avx512 in the function, it can use it anywhere. I don't see anywhere we could use any avx512 intrinsics in the avx2 code path though, from a somewhat close look: https://developercommunity.visualstudio.com/t/Invalid-code-gen-when-using-AVX2-and-SSE/10527298
We recently made a compiler change that leverages AVX512 intrinsics that are guaranteed to run on a code path and use this information to leverage AVX512 on that code path.
We have reviewed the provided repro and it uses AVX512 intrinsics (hidden behind the macros) and these intrinsics run on a code path that always executes which triggers the optimization above. The only reason that the AVX512 intrinsics worked previously was that we were downcasting them to AVX instructions to save binary space.
EDIT:
I'm almost certain we aren't, I manually removed the AVX512 sections of the file, and ran grep -o '\b_m\w*' < include/roaring/bitset_util.h | sort -u
, and checked every found intrinsic against The intel intrinsics guide, and verified that all were NOT avx512.
looks like if MSVC figures out we're going to avx512 in the function, it can use it anywhere.
I think we can disable AVX-512 code paths entirely (under Windows).
Looks like disabling AVX512 isn't enough, we have to disable AVX entirely 🫤
Looks like disabling AVX512 isn't enough, we have to disable AVX entirely
It is certainly possible to have hardware lacking AVX instruction support. But we should handle this scenario.
You may have seen that CRoaring now effectively runs your small tests from the amalgamated files in CI, and it appears to work.
@Dr-Emann Can you review CI tests https://github.com/RoaringBitmap/CRoaring/actions/runs/7670008347/job/20905194245
You can see that it is running the amalgamation tests which involve precisely the function you suggested. It is building it using the default Visual Studio 2022 provided by GitHub.
What are we missing?
@lemire: The difference appears to be in MSVC versions (or the cl versions? Don't really understand the VS/MSVC/cl.exe/etc version differences)
cmake seems to pick MSVC\14.37.32822\bin\HostX64\x64\CL.exe
, and rust seems to pick MSVC\14.38.33130\bin\HostX64\x64\cl.exe
. If I use the 14.37
cl, the compiled code works fine, if I use the 14.38
cl instead, I get the Illegal Instruction
@Dr-Emann I am updating to the very latest Visual Studio on my laptop.
But it is not going to be super useful because my laptop does support AVX instructions. :-/
@Dr-Emann
vpxord is AVX512F + AVX512VL.
My best guess is that this code confuses Visual Studio into using AVX-512 throughout:
int bitset_container_compute_cardinality(const bitset_container_t *bitset) {
int support = croaring_hardware_support();
#if CROARING_COMPILER_SUPPORTS_AVX512
if( support & ROARING_SUPPORTS_AVX512 ) {
return (int) avx512_vpopcount(
(const __m512i *)bitset->words,
BITSET_CONTAINER_SIZE_IN_WORDS / (WORDS_IN_AVX512_REG));
} else
#endif // CROARING_COMPILER_SUPPORTS_AVX512
if( support & ROARING_SUPPORTS_AVX2 ) {
return (int) avx2_harley_seal_popcount256(
(const __m256i *)bitset->words,
BITSET_CONTAINER_SIZE_IN_WORDS / (WORDS_IN_AVX2_REG));
} else {
return _scalar_bitset_container_compute_cardinality(bitset);
}
}
What confuses me is that you are say that disabling AVX-512 did not help.
@Dr-Emann Please review... https://github.com/RoaringBitmap/CRoaring/pull/579
If it makes sense to you, we shall try this approach. If it solves the issue, then we are lucky.
I'm fairly sure that when I tried with CROARING_COMPILER_SUPPORTS_AVX512=0
, it didn't work, no.
I think there were two "bugs" I posted.
The first was acknowledged as a bug by microsoft, and it seemed to be that the compiler would insert avx512 instructions when only AVX2 was enabled (or maybe that the AVX2 intrinsics were producing avx512 instructions?), and was marked as Fixed - Pending Release
almost a month ago.
The second was closed as not a bug, where the compiler figured out that the codepath was going to use avx512 intrinsics in the same codepath, so optimized assuming it could use avx512.
I think we're hitting the first issue, not the second.
@Dr-Emann If you take my PR, the one where I move the AVX-512, and you use that in your rust binding, does it help?
Nope, it still ends up with avx512 instructions with that branch (even with -DCROARING_COMPILER_SUPPORTS_AVX512=0
) when compiling with MSVC 14.38
(and I've taken rust totally out of the picture, it's just running c code on this PR branch.
@Dr-Emann Hmmmm.... I don't know what else to do.
I really think it's just an MSVC bug that'll hopefully be fixed in 14.39. I think our only real options are:
- ignore it for now, hopefully it'll be fixed soon
- disable the avx2 path entirely if
defined( _MSC_VER ) && _MSC_VER >= 1938
Disabling AVX seems extreme. I would happily disable AVX-512 under Windows if it were needed, but I feel uneasy about disabling AVX since almost all Windows machines that we care about have AVX2.
I really don't see any way around it for the versions of MSVC that has this bug, the alternative is miscompilation which will cause a crash.
Here's a godbolt link including only CSA
, popcount256
, and avx2_harley_seal_popcount256
, and you can see the output includes a vpxord
instruction, even though there's no mention of avx512 ANYWHERE, never mind in the same function: https://c.godbolt.org/z/d764KGWK1 (line 77 of the assembly output)
And if you change back to MSVC 19.37 (still don't understand the versioning, why is it 14 some places and 19 elsewhere? EDIT: see https://devblogs.microsoft.com/oldnewthing/20221219-00/?p=107601, 14.38 is the toolchain version 19.38 is the compiler version (and is related to the _MSC_VER)), there's no vpxord instruction.
Seeing same issue, glad to see there's a thread here about this because I've been tearing my hair out over this for a couple of weeks now:
https://github.com/mimblewimble/grin-gui/issues/73#issuecomment-1905982066
@yeastplume
I think that @Dr-Emann has demonstrated that it is a compiler bug.
@yeastplume For now, I recommend you just avoid the offending compiler version. If I were Microsoft, I would just patch it and make sure that the bug goes away.
I don't think we will do anything.
@yeastplume For now, I recommend you just avoid the offending compiler version. If I were Microsoft, I would just patch it and make sure that the bug goes away.
I don't think we will do anything.
Yes, makes sense and thanks for your work tracking this down.
Slightly tangential, I guess, but does anyone have a quick pointer as to how to force github actions to use a particular compiler version?
Slightly tangential, I guess, but does anyone have a quick pointer as to how to force github actions to use a particular compiler version?
The current windows-latest runner should only have two versions installed, a very old one, and the very latest one, see the specs...
https://github.com/actions/runner-images/blob/main/images/windows/Windows2022-Readme.md
I don't think it is possible to install another version of Visual Studio on the GitHub hosted runners.
So we have to wait for Microsoft to fix this.
See https://developercommunity.visualstudio.com/t/Code-gen-bug-uses-ymm16-register-for-AVX/10564317
@Dr-Emann Waiting for a fix from Microsoft did not help. They released 17.9 and it is the version we have in the GitHub runners, but the issue remains for us.
We need to change our code to try and fix this issue.