gpt-neox
gpt-neox copied to clipboard
DeepSpeed Sparse Attention is Broken
SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency triton==0.4.2, which is behind the DeepSpeed version of 1.0.0. It is far behind the version of Triton that we would like to use, 2.0.0.dev20221202, which is required for new Triton features.
Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.
Is this a “real” issue or can we just change the required version to obtain support?
I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away.
With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right.