mrc-for-flat-nested-ner About the shape of BERT output

About the shape of BERT output

Open BillXuce opened this issue 3 years ago • 1 comments

According to the Section 3.3.1 in the paper, the input of BERT consists of query and context whose length should be seq_len = n+m+2 and the output drops the representations of the query and special tokens. However, in bert_query_ner.py line 44 and 45, the shape of sequence_heatmap coming from BERT output is [batch_size, seq_len, hidden_size], which conflicts with the papers. So which method should be applied? How much difference between both(in performance)?

bert_outputs = self.bert(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)

sequence_heatmap = bert_outputs[0]  # [batch, seq_len, hidden]
batch_size, seq_len, hid_size = sequence_heatmap.size()

start_logits = self.start_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]
end_logits = self.end_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]

May 14 '21 14:05 BillXuce

Unfortunately there are a few places in the repository where the code conflicts with the paper. I'm assuming that when the authors say they "dropped" the query portion, what they mean is that when the loss is calculated they applied the start/end label masks. I don't know for sure, but it's just my assumption.

Jun 30 '21 07:06 seanswyi

mrc-for-flat-nested-ner mrc-for-flat-nested-ner copied to clipboard

About the shape of BERT output

mrc-for-flat-nested-ner
mrc-for-flat-nested-ner copied to clipboard