DeepSpeed
DeepSpeed copied to clipboard
Sparse Attention, Triton v1.0.0 and CUDA drivers v12+
Greetings,
We recently had to enable the Sparse Attention op in an environment with CUDA driver v12.0, but the only version of Triton that is supported (v1.0.0) resulted in an IndexError: map::at
error at compile time.
Looking at the Triton code it seems version 12 was not supported -since it did not exist at the time- and since we could not downgrade the driver we had to manually patch it to make it work.
For reference, here is the process:
Get the source code:
# Download and unzip Triton 1.0 source code
wget -O triton-1.0.zip https://github.com/openai/triton/archive/refs/tags/v1.0.zip
unzip triton-1.0.zip
cd triton-1.0
# Add the CUDA12 version to the supported architectures
vi lib/driver/module.cc
Look for the architecture map on line 214 and change it like this:
static std::map<int, int> vptx = {
{10000, 63},
{10010, 64},
{10020, 65},
{11000, 70},
{11010, 71},
{11020, 72},
{11030, 73},
{11040, 73},
// FIXME force CUDA12 support
{12000, 73},
};
Now install that version instead of the pip one:
# Install triton from source
cd python
pip3 install -e .
This compiles and runs, but it is clearly a concern for future updates. Is there a reason why only Triton v1.0.0 is supported?
Best,
Alfredo