DeepSpeed
DeepSpeed copied to clipboard
Request to update flash_attention in deepspeed inference
triton flash_attention used by deepspeed inference is compatible with 6 month old pre release version of triton( 2.0.0.dev20221202 ) .
triton dot , trans operators are rewritten in Complete rewrite of the backend from scratch flash_attention is optimized in the Improved flash attention forward pass performance
Could you kindly update flash_attention with the current triton release.
- I get this error when running deepspeed inference , with triton release version (2.0.0)
TypeError: dot() got an unexpected keyword argument 'trans_b'
- With qk += tl.dot(q, k, trans_b=True) to tl.dot(q,tl.trans(k)) , deepspeed inference hangs
Hi @bmedishe - there are two different triton versions in the requirements, one for sd that is the one you note, and the one for sparse_attn that is 1.0.0. Could you mention what you're trying to run? We are working on updating both, but would be good to know what tests or things you're running.
@loadams I am running stable diffusion with deepspeed inference.
@bmedishe - thanks, still working on getting these updated, will update this thread when it is complete.
@bmedishe - we need this specific version for now for stable diffusion for now unfortunately.
Related #4008.
Am I correct in assuming that the sparse attention and stable diffusion are mutually exclusive when building DS right now? Is there somewhere in the documentation I can add an explicit mention of this?
Any plans on bumping triton
dependency of sparse attention to 2.x ?