xsimd icon indicating copy to clipboard operation
xsimd copied to clipboard

Extend the usage of `load_masked` and `store_masked` to `batch_bool`

Open dragon-archer opened this issue 1 month ago • 2 comments

I've noticed that currently the load_masked and store_masked only supports batch_bool_constant.

I think load_masked and store_masked is very suitable for dealing with loop tails, however in this case the mask is dynamic. Since most architectures that support masked load and masked store doesn't require the mask to be a constant, perhaps it's better to provide a batch_bool version of load_masked and store_masked.

Thank you very much.

dragon-archer avatar Nov 28 '25 01:11 dragon-archer

Hi @dragon-archer,

We will consider it after this release. The reason we left it out for now is because for architectures that have no native masked intrinsics this generates a lot of scalar instructions. This might cause slower performance than handling the tail manually outside the for loop. Some users might be shafted when compiling across different architectures.

For handling tail we might need a new API like load_tail so that we can fallback to a scalar loop tail if masked intrinsics are not available.

TLDR: we would like to think carefully before opening the floodgates to avoid hurting users.

Cheers, Marco

DiamonDinoia avatar Nov 28 '25 18:11 DiamonDinoia

Fine, I think a new API might be a better idea, hope it will be available soon.😄

dragon-archer avatar Nov 29 '25 05:11 dragon-archer