MMM-MCQA about MAN

I checked the BertForMultipleChoice_SAN2 model and found the forward function need two parameters: premise_mask, hyp_mask. How can I get premise_mask and hyp_mask ? Can you give me some samples ?

Mar 03 '21 00:03 qinhui99

hi, the premise_mask and hyp_mask are used to indicate the positions of the premise and hypothesis in a sequence. Usually the premise and hypothesis are concatenated together into a sequence and sent to the BERT model, then when we want to calculate the attention between the premise and hypothesis, we need to know which tokens in this sequence belong to the premise and which tokens for the hypothesis. So this thinking can let you know how to create these two masks.

Mar 03 '21 07:03 jind11

Thanks for your reply. So we should prepare the premise_mask and hyp_mask datas in tokenizer ,then put them into dataset. Is that right?

Mar 03 '21 09:03 qinhui99

The premise_mask and hyp_mask can be created together with the input mask, in which we tokenize the sequence and convert them into IDs.

Mar 03 '21 18:03 jind11

Thanks. Supposed I have the following datas:

passage_text='Woman: Has Tom moved to the downtown? Man: No. He is still living in the country.' question_text='Where does Tom live? ' answer_text='In the city.'

tokenizer = BertTokenizer.from_pretrained('hfl/chinese-macbert-base')

Should I get the premise_mask and hyp_mask like these:

dict_tks2=tokenizer.encode_plus(question_text,max_length=128, padding='do_not_pad', truncation=True,

            add_special_tokens=False,
            return_attention_mask=True,
            return_token_type_ids=False,
            return_tensors='pt')

premise_mask=dict_tks2['attention_mask']

dict_tks3=tokenizer.encode_plus(answer_text,max_length=128, padding='do_not_pad', truncation=True,

            add_special_tokens=False,
            return_attention_mask=True,
            return_token_type_ids=False,
            return_tensors='pt')

hyp_mask=dict_tks3['attention_mask']

Mar 04 '21 05:03 qinhui99

Premise should be the concatenation of passage and question, while hypothesis is the answer
The length of both premise and hypothesis masks should be the same as the input mask but the difference is that all numbers of the premise mask are zero except for those positions of the premise. When you generate the input mask, you first create a vector of zeros whose length is the max_sequence_length, then all positions that have tokens are changed to 1 while the other positions are remained 0. In the same manner, when you create the premise mask, you first get a vector of zeros, then you change those positions of a premise to 1.

Mar 04 '21 06:03 jind11

Thanks. Got it.

Mar 04 '21 08:03 qinhui99

MMM-MCQA MMM-MCQA copied to clipboard

about MAN

MMM-MCQA
MMM-MCQA copied to clipboard