DeepSpeed
DeepSpeed copied to clipboard
Allow FixedSparseAttention with num_global_blocks = 0
To simulate naive local attention
Can one of the admins verify this patch?
Closing this PR as it appears to be stale and out of date - if this is still useful, please re-open or create a new PR. Thanks!