Use wrapping shifts in fallback implementations
The current fallback implementations of left/right shifts for integers use <</>>, which panics when the shift amount is greater than the lane width (e.g. shifting a u32x4 by 32).
Using wrapping_shl/wrapping_shr instead would solve this issue and match the behavior of the vectorized versions, which only keep the low bits of the shift amount (see e.g. https://doc.rust-lang.org/core/arch/wasm/fn.u32x4_shr.html).
These are a bit tricky since the SIMD shifts don't behave like wrapping shifts. For example, SIMD shift of a 1u32 << 32 will be zero, but for intel will be 1, since wrapping masks, but the SIMD wraps dont. So maybe panics are the right thing to do, maybe its better toto add a wrapping_shl that masks on SIMD and does not panic.
i think if the "normal" effect people get, by which i mean the sse2 code path, never panics, then we should try to make all code paths never panic.
they don't need to be perfectly cross platform when weird inputs are given, but at least not a panic.
Going to close and reopen because I can't remember if that makes the CI do a new attempt, but I think it does.