DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Deepspeed Ulysses

Open conceptofmind opened this issue 1 year ago • 2 comments
trafficstars

Ring Attention should work with Deepspeed Ulysses, correct? Are there any notable issues combining deepspeed's efficient sequence parallelism with such an attention mechanism? I do understand flash attention works.

https://github.com/zhuzilin/ring-flash-attention

conceptofmind avatar May 02 '24 04:05 conceptofmind

Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as the qkv can be split or sharded along sequence and head dimensions, it should work. Contributions are welcome!

samadejacobs avatar May 06 '24 18:05 samadejacobs

Hi @samadejacobs,

I appreciate the insight.

I will have to test both of them in conjunction together and let you know.

Thank you,

Enrico

conceptofmind avatar May 10 '24 02:05 conceptofmind