[WIP] Fix #3446: implement shuffle, rotate and shift for short vectors
Description
This PR address the issue #3446 . It is based on #3485. This is a work-in-progress PR so to get some feedback. Indeed, I do not know if this is the best approach to implement these functions. Besides, the performance of some functions can be a concern in some pathological cases (maybe we can optimize them later). I wrote some NOTE comment about open questions.
Overall, the new functions are fast on "simple-pumping" targets when N < programCount (often not that bad with N < 2*programCount) and the provided function argument (i.e. perm or offset) is a compile-time constant especially.
Some thought that are not in the code:
- Does using conditions like
if(N < programCount)is the best solution as ? It might increase the size of the generated code before optimizations (so the compilation time). Using template for that does not seems possible though since partial specialization is AFAIK not yet implemented. - Should such functions be implemented in stdlib.isph or as builtins? I think it cannot be builtins due to templates. Maybe some builtins can be written for some frequent cases if needed later.
I am going to write functional tests later.
By the way, I also added some previously-missing #undef.
Related Issue
- [x] Linked to relevant issue(s)
Checklist
- [ ] Code has been formatted with
clang-format(e.g.,clang-format -i src/ispc.cpp) - [x] Git history has been squashed to meaningful commits (one commit per logical change)
- [ ] Compiler changes are covered by lit tests
- [ ] Language/stdlib changes include new functional tests for runtime behavior
- [ ] Documentation updated if needed
It would be very helpful to split this PR to several (one per function) + add tests.
select implementation is trivial and can be merged right away, but others are trickier and may require more time for test/review.