accelerate-llvm
accelerate-llvm copied to clipboard
Align shared memory in fold & scan (only shuffle)
Description
This PR ensures that allocations of shared memory are properly aligned.
Motivation and context
Previously, shared memory was allocated without any padding. This caused that reads and stores may be misaligned, for instance when scanning an array containing (Bool, Int).
In particular, this may occur in the implementation of segmented scans. Segmented scans are typically implemented by pairing a value with a flag, as (Bool, a). However, if one implements it as (a, Bool), and the size of the allocated array is not a multiple of the alignment of a, then this bug will trigger. Reads into the array of as will be misaligned.
This PR only fixes this issue for folds and scans using shuffle instructions. Fixing this for folds and scans on onlder hardware is possible, but probably not worth it given the age of that hardware and complexity of the fix. I would thus propose to drop support for compute capabilities before 3.0.
How has this been tested?
Using various applications of scans, including segmented scans defined with (a, Bool), on our RTX 4090.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [x] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.