simd-json icon indicating copy to clipboard operation
simd-json copied to clipboard

Illegal instruction when attempting to use "allow-non-simd" on Athlon II X2 270

Open ssokolow opened this issue 3 years ago • 5 comments

I decided to see if simd-json with allow-non-simd and known-key could give me any kind of performance boost over serde_json on a machine too old to have AVX2 or SSE4.2 and, when it tried to parse the large Discord History Tracker chatlog I've been tuning performance for with serde_json, my very first test run died with Illegal instruction.

I have only minimal experience with debugging compile-to-native languages, but a simple "Run it under gdb and see where it dies" with the unmodified release build I already had says that the SIGILL is being produced by core::core_arch::x86::ssse3::_mm_shuffle_epi8 and, if the bt full output is meaningful (which it looks like it is), then:

  1. It's dying inside <simd_json::sse42::stage1::SimdInput as simd_json::Stage1Parse<core::core_arch::x86::__m128i>>::find_whitespace_and_structurals
  2. That's inside sse42/stage1.rs, suggesting a misdetection of my CPU.

Environment

  • simd-json version 0.4.13 (from Cargo.lock)
  • rustc 1.58.0 (02072b482 2022-01-11)
  • RUSTFLAGS="-C target-cpu=native" (Already being used for speed boosts elsewhere)
  • model name : AMD Athlon(tm) II X2 270 Processor
  • flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save
  • Probably not relevant, but jemallocator 0.3.2 is the global allocator and it's being run in a Rayon .par_iter() to deserialize one file per core off an SSD.

ssokolow avatar Feb 24 '22 02:02 ssokolow

Simdjson is programmed to use SSE4.2 or AVX2 intrinsics on x86. The allow-non-simd feature relies on LLVM translating the intrinsics to machine code compatible with the target CPU. Apparently that does not work at all, at least with this target CPU. Even if the LLVM translation were working, allow-non-simd would likely not give you any speedup compared to stock serde, but likely be much slower.

hkratz avatar Feb 24 '22 11:02 hkratz

Hmm. Might it be a good idea to adjust the phrasing in the README so it communicates that allow-non-simd is gating an LLVM compatibility fallback, rather than code in simd-json itself?

ssokolow avatar Feb 27 '22 18:02 ssokolow

Ja that was the goal, seems not to work as hoped when run in a real non AVX2/SSE4.2 compatible computer - sadly something at least I can't test.

But as @hkratz pointed out, it'll very very likely be slower then serde, serde is very well optimized and does a great job parsing json in non SIMD capable environments.

So perhaps the best move is just to remove the flag, given it doesn't work and it's extremely hard to test?

Licenser avatar Feb 27 '22 20:02 Licenser

Honestly, I'm not the best person to ask on that front since, without working fallback support, I wouldn't use simd-json even with the most up-to-date processor. I value portability highly in my creations.

(I don't suppose there'd be a way to redesign allow-non-simd to instead add a dependency on serde_json which will be delegated to if runtime CPU feature detection doesn't find a supported SIMD feature?)

ssokolow avatar Feb 27 '22 20:02 ssokolow

That's a really good idea, I never thought about that!

Licenser avatar Feb 27 '22 20:02 Licenser

Closing this as #322 would cover this in a more generic way

Licenser avatar Oct 09 '23 12:10 Licenser