David Thomas
David Thomas
> Update; non_blocking=True for GPU -> CPU doesn't guarantee to synchronize when tolist() is called, so it is not safe. I used the blocking op instead. This decreases the perf...
> (Edit: Some issues still exist when mp_size>2 or when padding='right') I tried running this with mp_size 4, and it worked for me. As far as I can tell, this...
The fix mentioned by @gnap was implemented by @ymwangg in this commit: https://github.com/ymwangg/flash-attention/commit/73541983dec952980b43aac36da296e6bc517211. @gnap, would you mind checking it to verify that's what you had in mind? @ymwangg told me...
@skrider Are you going to rebase this so it can get merged?
https://github.com/Dao-AILab/flash-attention/pull/708 This was found and a PR submitted in December, but it never got any attention.