DeepSpeed Sparse Attention, Triton v1.0.0 and CUDA drivers v12+

Sparse Attention, Triton v1.0.0 and CUDA drivers v12+

Open asolano opened this issue 1 year ago • 0 comments

Greetings,

We recently had to enable the Sparse Attention op in an environment with CUDA driver v12.0, but the only version of Triton that is supported (v1.0.0) resulted in an IndexError: map::at error at compile time.

Looking at the Triton code it seems version 12 was not supported -since it did not exist at the time- and since we could not downgrade the driver we had to manually patch it to make it work.

For reference, here is the process:

Get the source code:

# Download and unzip Triton 1.0 source code
wget -O triton-1.0.zip https://github.com/openai/triton/archive/refs/tags/v1.0.zip
unzip triton-1.0.zip
cd triton-1.0

# Add the CUDA12 version to the supported architectures
vi lib/driver/module.cc

Look for the architecture map on line 214 and change it like this:

static std::map<int, int> vptx = {
  {10000, 63},
  {10010, 64},
  {10020, 65},
  {11000, 70},
  {11010, 71},
  {11020, 72},
  {11030, 73},
  {11040, 73},
  // FIXME force CUDA12 support
  {12000, 73},
};

Now install that version instead of the pip one:

# Install triton from source
cd python
pip3 install -e .

This compiles and runs, but it is clearly a concern for future updates. Is there a reason why only Triton v1.0.0 is supported?

Best,

Alfredo

Dec 26 '23 05:12 asolano

DeepSpeed DeepSpeed copied to clipboard

Sparse Attention, Triton v1.0.0 and CUDA drivers v12+

DeepSpeed
DeepSpeed copied to clipboard