mrc-for-flat-nested-ner icon indicating copy to clipboard operation
mrc-for-flat-nested-ner copied to clipboard

About the shape of BERT output

Open BillXuce opened this issue 3 years ago • 1 comments

According to the Section 3.3.1 in the paper, the input of BERT consists of query and context whose length should be seq_len = n+m+2 and the output drops the representations of the query and special tokens. However, in bert_query_ner.py line 44 and 45, the shape of sequence_heatmap coming from BERT output is [batch_size, seq_len, hidden_size], which conflicts with the papers. So which method should be applied? How much difference between both(in performance)?

bert_outputs = self.bert(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)

sequence_heatmap = bert_outputs[0]  # [batch, seq_len, hidden]
batch_size, seq_len, hid_size = sequence_heatmap.size()

start_logits = self.start_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]
end_logits = self.end_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]

BillXuce avatar May 14 '21 14:05 BillXuce

Unfortunately there are a few places in the repository where the code conflicts with the paper. I'm assuming that when the authors say they "dropped" the query portion, what they mean is that when the loss is calculated they applied the start/end label masks. I don't know for sure, but it's just my assumption.

seanswyi avatar Jun 30 '21 07:06 seanswyi