simd
simd copied to clipboard
Add other shuffles back?
From #30, there may be use in shuffle instructions other than v8x16. Pros are simpler runtime code generation (there are hardware instructions that are a direct match) and space savings (less indices to store). The latter can be furthered by packing the indices. The cost is more complexity in the spec (multiple instructions instead of one).
FYI, the other shuffles were removed in 8a1f98c
I feel that this isn't really necessary. Yes, it could have some space savings, but it does not add any new functionality or optimization opportunity. I could be convinced that adding these extra shuffles would be worth the effort by real-world data showing a non-negligible code size win.
I want to express my position on this as I'm the one who proposed to put them back.
First, I don't think adding new instructions increases the complexity of the spec (it will be definitely longer, but not harder). So the question is not about spec complexity, but number of opcodes.
Is it worth to "waste" 3 opcodes for that?
Pros:
- It definitely saves binary space.
The maximum expected gain would be 15 bytes per shuffle if you need
v64x2.shuffle(this would be 9 bytes "only" if immediates are packed #69 ) For a horizontal add of floats (for example), you could then save 15 + 5 bytes in total (for a single reduction). - It actually reduces the complexity of the virtual machine by avoiding the need for complex pattern matching of the shuffle rule. Indeed, it conveys more semantic from the compiler: more semantic => simpler efficient translation.
Cons:
- 3 more opcodes consumed (might be an issue in the future, but definitely not now)
- Larger spec (do we really care about this one?)
In my opinion, it is worth specifying those 3 extra instructions. Also, please remember that shuffling is an important part of most non-regular SIMD code (ie: everything that is not pure linear algebra basically).
See @binji's comment in #69, WASM spec hasn't tried to do this type of optimization yet:
In general we haven't tried too hard to minimize the size of uncompressed wasm. For example, every load/store has an extra two bytes at least for the alignment and offset.
It would make sense to do space optimizations across all WASM, not just SIMD.