wide icon indicating copy to clipboard operation
wide copied to clipboard

Combination of `i8x16::move_mask()` and `.trailing_zeros()` is a footgun

Open whitequark opened this issue 7 months ago • 3 comments

For example, consider this code, extracted from one of my libraries:

for &group in samples.array_chunks::<16>() {
    let mask = predicate(i8x16::new(group));
    offset += mask.move_mask().trailing_zeros() as usize;
    if mask.any() { break }
}
offset

The intent here is to move offset by the fractional part of the group that matched the predicate. Unfortunately, if the predicate didn't match at all, the moved mask would have 32 zeroes, throwing off the pointer.

(I know about iter.remainder()--in this particular application it can't be used because by the time the remainder is examined, it is already advanced past the group being matched by the predicate.)

Making i8x16::move_mask() be an u16 would solve this problem, but won't work for i32x4. Adding leading_zeros() and trailing_zeros() to the SIMD types, or perhaps boolean mask types from #43, would also solve it.

whitequark avatar Jul 06 '24 04:07 whitequark