LLaDA icon indicating copy to clipboard operation
LLaDA copied to clipboard

should I provide a true attention mask?

Open colinzhaoxp opened this issue 10 months ago • 7 comments

Hello, thank you for releasing the code and weights of LLaDA as open source.

I'm a bit confused about why attn_mask is set to None. When fine-tuning LLaDA with padded input data (specifically left-padded using a padding token), is this setting still appropriate? Or should I instead provide a proper attention mask to account for the padding?

Here's the relevant code snippet:

# Get the attention scores.
# shape: (B, nh, T, hs)
att = self._scaled_dot_product_attention(
    q,
    k,
    v,
    attn_mask=None,
    dropout_p=0.0 if not self.training else self.config.attention_dropout,
    is_causal=False,
)

Code reference: here

Thank you in advance for your help!

colinzhaoxp avatar Jun 29 '25 16:06 colinzhaoxp

same issue #89

colinzhaoxp avatar Jun 29 '25 16:06 colinzhaoxp

Thanks for your interest!

Since we didn't use attention masks during both pre-training and SFT processes, we simply set it to None for convenience. However, we have to admit that attention masks might be useful in certain scenarios, and I'm considering updating our code.

nieshenx avatar Jun 30 '25 02:06 nieshenx

Hi, I've fixed this bug in this PR, could you test it? @colinzhaoxp @NieShenRuc

Kamichanw avatar Jul 05 '25 17:07 Kamichanw

Hi, I've fixed this bug in this PR, could you test it? @colinzhaoxp @NieShenRuc

I will test it in hours

colinzhaoxp avatar Jul 06 '25 02:07 colinzhaoxp

@Kamichanw thanks for your work! I find that your add a new argument attention_mask for attention function. I have a question about the argument attention_bias in attention function, is it useless? and What role did it play?

colinzhaoxp avatar Jul 06 '25 03:07 colinzhaoxp

I've asked @NieShenRuc , attention_bias is not engaged to final output. It may play a role similar to the attention mask in the original early stage of training, but the relevant code was later removed by the author. I think it can be removed safely now.

Kamichanw avatar Jul 06 '25 03:07 Kamichanw

Hi @Kamichanw I have tested it, and it works well. Sorry for the delay.

colinzhaoxp avatar Jul 11 '25 02:07 colinzhaoxp