triton icon indicating copy to clipboard operation
triton copied to clipboard

[AMD] Enhanced SharedMem Allocation for mutually exclusive but aliased buffers

Open zhanglx13 opened this issue 10 months ago • 9 comments

Upstream https://github.com/ROCm/triton/pull/337

cc+ @sjw36

zhanglx13 avatar Apr 10 '24 06:04 zhanglx13

This PR breaks test_flash_attention.py on A100. Need investigation.

zhanglx13 avatar Apr 10 '24 15:04 zhanglx13

@zhanglx13, there is a PR https://github.com/openai/triton/pull/3784, could it be related to the error here on A100?

scxiao avatar Apr 29 '24 13:04 scxiao

@zhanglx13, there is a PR #3784, could it be related to the error here on A100?

Probably not. That PR is about a regression. But this PR introduced incorrect results for FA tests on A100. We can rebase when that PR lands to confirm.

zhanglx13 avatar Apr 29 '24 14:04 zhanglx13

@ptillet @Jokeren I believe these changes are ready for review and appreciate your feedback.

sjw36 avatar May 01 '24 22:05 sjw36

Thanks. Will get back to you soon

Jokeren avatar May 02 '24 00:05 Jokeren

Most of the current feedback should be addressed, please verify.

@Jokeren I am looking into exactly why the membar test changed.

sjw36 avatar May 02 '24 20:05 sjw36

@Jokeren regarding the membar test change:

  • Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive
  • This causes membar to require a barrier before the else branch alloc

I think all other feedback has been addressed, please verify.

sjw36 avatar May 03 '24 22:05 sjw36

Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive

Allocation is not a memory write or memory read. Hmm, that surprised me a bit. Maybe there are some issues there

Jokeren avatar May 03 '24 22:05 Jokeren

Now the allocation correctly reuses the same buffer offset since all buffers on mutually exclusive

Allocation is not a memory write or memory read. Hmm, that surprised me a bit. Maybe there are some issues there

Previously, the alias caused both (if/else branch) buffers ranges to be extended causing the over-allocation.

sjw36 avatar May 03 '24 23:05 sjw36

Do we still want to land this PR? Closing it for now as I believe we want to keep it for later but feel free to reopen it otherwise.

ThomasRaoux avatar May 24 '24 15:05 ThomasRaoux