Implement lane-wise modulo.
No safety about the subtract one, but shouldn't be an issue if your values are in range.
Depends on the "ergonomics" PR, which should merge first, and then the "base" of this PR should be moved to master.
I have repeatedly expressed that accidental indentation levels are contrary to the principles of how the sources ought to express more detail. Besides, they have the practical effect of shifting the important part of the source code to the right, and an incentive to damage the code with wider lines.
The implementation look good. I have a problem with the unit tests. Not even with what is tested, that seems right, but how things are tested, however semantically correct, they are misuse of the available operations. A user attempting to use this operations with elegance and top performance would be more confused by these tests than without these tests. That users get confused with the codebase as it is, is powerful reason to not confuse more, not to accept more confusing tests because there are precedents of confusing tests.