MMM-MCQA
MMM-MCQA copied to clipboard
about MAN
I checked the BertForMultipleChoice_SAN2 model and found the forward function need two parameters: premise_mask, hyp_mask. How can I get premise_mask and hyp_mask ? Can you give me some samples ?
hi, the premise_mask and hyp_mask are used to indicate the positions of the premise and hypothesis in a sequence. Usually the premise and hypothesis are concatenated together into a sequence and sent to the BERT model, then when we want to calculate the attention between the premise and hypothesis, we need to know which tokens in this sequence belong to the premise and which tokens for the hypothesis. So this thinking can let you know how to create these two masks.
Thanks for your reply. So we should prepare the premise_mask and hyp_mask datas in tokenizer ,then put them into dataset. Is that right?
The premise_mask and hyp_mask can be created together with the input mask, in which we tokenize the sequence and convert them into IDs.
Thanks. Supposed I have the following datas:
passage_text='Woman: Has Tom moved to the downtown? Man: No. He is still living in the country.' question_text='Where does Tom live? ' answer_text='In the city.'
tokenizer = BertTokenizer.from_pretrained('hfl/chinese-macbert-base')
Should I get the premise_mask and hyp_mask like these:
dict_tks2=tokenizer.encode_plus(question_text,max_length=128, padding='do_not_pad', truncation=True,
add_special_tokens=False,
return_attention_mask=True,
return_token_type_ids=False,
return_tensors='pt')
premise_mask=dict_tks2['attention_mask']
dict_tks3=tokenizer.encode_plus(answer_text,max_length=128, padding='do_not_pad', truncation=True,
add_special_tokens=False,
return_attention_mask=True,
return_token_type_ids=False,
return_tensors='pt')
hyp_mask=dict_tks3['attention_mask']
- Premise should be the concatenation of passage and question, while hypothesis is the answer
- The length of both premise and hypothesis masks should be the same as the input mask but the difference is that all numbers of the premise mask are zero except for those positions of the premise. When you generate the input mask, you first create a vector of zeros whose length is the max_sequence_length, then all positions that have tokens are changed to 1 while the other positions are remained 0. In the same manner, when you create the premise mask, you first get a vector of zeros, then you change those positions of a premise to 1.
Thanks. Got it.