toys
toys copied to clipboard
avx512vbmi-remove-spaces
Instead of generating the addmask by repeatedly adding to the existing addmask in a loop, isn't the final mask simply the cumulative horizontal sum of the space mask
?
Looks like a prefix sum can be computed the usual way, or can also be parallelized?
@smallnamespace thank you very much! It indeed looks similar, I'll check this,