ring-flash-attention
ring-flash-attention copied to clipboard
large memory usage
Thanks for sharing this excellent implementation of ring attention.
Here are my test results on 2*A100 (with nvlink). Judging from the results, the memory usage of ring attention(ring_flash_attn_qkvpacked_func) seems to be very large. This is not as expected. Are there any possible problems?