MUR Task

Open gyhou123 opened this issue 2 years ago • 1 comments

Hello, I would like to consult the following line of code.

, mlm_tgt_encodings, * = self.utt_encoder.bert(context_mlm_targets[ctx_mlm_mask], context_utts_attn_mask[ctx_mlm_mask])

context_mlm_targets[ctx_mlm_mask] represents the utterance tokenization before [MASK] context_utts_attn_mask[ctx_mlm_mask] represents the attention mask after [MASK]

They don't match. Why not recalculate the attention mask？

Apr 26 '23 16:04 gyhou123

By saying [MASK], do you mean masking utterances in contexts or masking words in utterances? If the former, then 'context_utts_attn_mask' represents the attention mask before [MASK]. Please check Line 249 in data_loader.py: context_utts_attn_mask = [[1]*len(utt) for utt in context], which does not set masked positions to 0's.

May 24 '23 12:05 guxd