DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] attention_mask is overwritten by dummy tensor at DeepSpeedSelfAttentionFunction

Open codertimo opened this issue 3 years ago • 0 comments

Describe the bug

https://github.com/microsoft/DeepSpeed/pull/1705 add line to overwrite the input_mask(attention_mask) at DeepSpeedSelfAttentionFunction to dummy attention mask. Due to this code, attention_mask input has been ignored for all transformer models forwards.

I think https://github.com/microsoft/DeepSpeed/issues/1797 is caused by this iussue.

Expected behavior

softmax kernel replace input_mask = torch.empty(1) to nullptr.

https://github.com/microsoft/DeepSpeed/blob/32d97976ce08f00228ceb55a2d93005bd8b2769a/csrc/transformer/inference/csrc/pt_binding.cpp#L35-L46

codertimo avatar Apr 26 '22 09:04 codertimo