ru-dalle icon indicating copy to clipboard operation
ru-dalle copied to clipboard

Sparse attention support

Open neverix opened this issue 2 years ago • 0 comments

Currently, the inference code creates the entire attention matrix and then masks it. Sparse attention implementations like Triton are more efficient. Does the pre-training code support sparse attention? Will it ever be released?

neverix avatar Dec 04 '21 20:12 neverix