DistIL
DistIL copied to clipboard
Loop auto-vectorization
Auto-vectorization is a very interesting transform because it can have a significant impact on the few cases where it works. In reality, only a few loops are effectively vectorizable due to unprevalent memory accesses and branching patterns present in normal code. The lack of efficient gather instructions makes this even more problematic with the involvement of non-sequential objects/structs rather than flat arrays.
TODOs
- [x] Initial implementation
- Simple for-i loop, memory accesses and basic ops
- [x] Support for reductions (Add/Mul/And/Or/Xor/Min/Max)
- It is also possible to support selects by storing the iteration index vector followed by an horz_max() at the end of the loop (this would be useful for loops searching and returning an index).
- [x] Vector width selection
- Loop must only work with scalars of the same type. Mixing types and conversions makes this more difficult (general solution seems to be partial unrolling).
- Legalization
- [ ] Check that stores don't overlap with any other load inside the loop (fallback to runtime guards)
- Operations
- [x] Memory accesses (address must be a
lea invariant + loop_index)- Needs #18 for safe array/span accesses
- [x] Basic binops: add, sub, mul, fdiv, and, or, xor
- idiv/frem were intentionally left out because there's no hw accel in neither x64 nor arm.
- [x] Other ops: neg, not
- [x] Basic math calls: Min, Max, Abs, Floor, Ceil, Sqrt
- [ ] Other math calls: RSqrt, Rcp, Fmadd
- Some of these don't have xplat intrinsics so we'd need to implement aux functions (worth proposing fma?)
- [x] Comparisons and selects
- Only if operands match the op signess (e.g.
ult u32, u32is ok but notult i32, i32)
- Only if operands match the op signess (e.g.
- [ ] Conversions
- [x] Float <-> Int32
- [x] Memory accesses (address must be a
Extras:
- [x] Basic if-conversion
- [x] Code gen support for SelectInst
- [ ] Handle non-diamond graphs through path duplication (for empty blocks only)
- [x] Consider introducing a GetElementPtr/LEA instruction, because recognizing indexing expressions is tricky. Having this could also help consolidation of load/store instructions for arrays and fields.
- Consider first-class support for vector types in the IR. This may not be that valuable outside of bringing the ability to perform basic peepholes.
- Consider supporting basic transcendental math functions: Sin, Cos, Log, Exp (port from DirectXMath lib?)
- Consider supporting some basic cost-model