gpt-neox DeepSpeed Sparse Attention is Broken

DeepSpeed Sparse Attention is Broken

Open dashstander opened this issue 2 years ago • 2 comments

SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency triton==0.4.2, which is behind the DeepSpeed version of 1.0.0. It is far behind the version of Triton that we would like to use, 2.0.0.dev20221202, which is required for new Triton features.

Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.

Mar 29 '23 18:03 dashstander

Is this a “real” issue or can we just change the required version to obtain support?

Mar 30 '23 16:03 StellaAthena

I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away.

With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right.

Mar 30 '23 17:03 dashstander

gpt-neox gpt-neox copied to clipboard

DeepSpeed Sparse Attention is Broken

gpt-neox
gpt-neox copied to clipboard