triton
triton copied to clipboard
[AMD] Enhanced SharedMem Allocation for mutually exclusive but aliased buffers
Upstream https://github.com/ROCm/triton/pull/337
cc+ @sjw36
This PR breaks test_flash_attention.py on A100. Need investigation.
@zhanglx13, there is a PR https://github.com/openai/triton/pull/3784, could it be related to the error here on A100?
@zhanglx13, there is a PR #3784, could it be related to the error here on A100?
Probably not. That PR is about a regression. But this PR introduced incorrect results for FA tests on A100. We can rebase when that PR lands to confirm.
@ptillet @Jokeren I believe these changes are ready for review and appreciate your feedback.
Thanks. Will get back to you soon
Most of the current feedback should be addressed, please verify.
@Jokeren I am looking into exactly why the membar test changed.
@Jokeren regarding the membar test change:
- Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive
- This causes membar to require a
barrier
before the else branch alloc
I think all other feedback has been addressed, please verify.
Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive
Allocation is not a memory write or memory read. Hmm, that surprised me a bit. Maybe there are some issues there
Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive
Allocation is not a memory write or memory read. Hmm, that surprised me a bit. Maybe there are some issues there
Previously, the alias caused both (if/else branch) buffers ranges to be extended causing the over-allocation.
Do we still want to land this PR? Closing it for now as I believe we want to keep it for later but feel free to reopen it otherwise.