bert-event-extraction Two approaches to improve the performance

Two approaches to improve the performance

Open Life-0-1 opened this issue 4 years ago • 6 comments

Hi, I read your code and found there are two problems that hinder the performance improvement. First, as I know, previous papers use head words of entity mentions as the candidate arguments, but you use the whole word sequence of entity mentions, which harms the argument-level performance a lot. Second, while training, you train the argument-level classifier based on predicted triggers, instead, I believe the argument-level classifier should be trained on the golden triggers.

Mar 16 '20 02:03 Life-0-1

I further read your evaluation scripts, and found that your evaluation metrics are too strict. In your code, if an argument is correctly classified, its corresponding predicted trigger should also be correctly classified, however, it's too strict and is not adopted in previous papers. There is the way they adopted to evaluate the argument-level performance.

A trigger is correct if its event subtype and offsets match those of a reference trigger.
An argument is correctly identiﬁed if its event subtype and offsets match those of any of the reference argument mentions.
An argument is correctly identiﬁed and classiﬁed if its event subtype, offsets and argument role match those of any of the reference argument mentions.

You could read this paper for more information, "Joint Event Extraction via Structured Prediction with Global Features".

Mar 16 '20 09:03 Life-0-1

So (i, t_start, t_end, t_type_str, a_start, a_end, a_type_idx) should be (i, t_type_str, a_start, a_end, a_type_idx) instead

Mar 18 '20 17:03 edchengg

I also got F1 for trigger classification around 69 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem?

Mar 18 '20 18:03 edchengg

I also got F1 for trigger classification around 66 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem? Hi, I have read the two papers before, and I have get the f1=74~75 for trigger-classification task by use Pretrained-LM+CRF. And I don't use any auxiliary information such as NER and pos imformation, my result is close to the paper 'Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources'. There is my code: https://github.com/Hanlard/Transformer-based-pretrained-model-for-event-extraction, which is jointly extracting the trigger and arguments. The f1-value could be higher if you only train it for trigger-classification with the pre-trained model 'XLM-Roberta'. Although the authors of the paper 'Exploring Pre-trained Language Models for Event Extraction and Generation' are from the same school (NUDT) with me, I don't think this paper could be reproduction.

Mar 19 '20 00:03 Hanlard

I also got F1 for trigger classification around 66 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem? Hi, I have read the two papers before, and I have get the f1=74~75 for trigger-classification task by use Pretrained-LM+CRF. And I don't use any auxiliary information such as NER and pos imformation, my result is close to the paper 'Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources'. There is my code: https://github.com/Hanlard/Transformer-based-pretrained-model-for-event-extraction, which is jointly extracting the trigger and arguments. The f1-value could be higher if you only train it for trigger-classification with the pre-trained model 'XLM-Roberta'. Although the authors of the paper 'Exploring Pre-trained Language Models for Event Extraction and Generation' are from the same school (NUDT) with me, I don't think this paper could be reproduction.

Thanks for you reply and code! I will take a look ASAP. Its good to know someone get 74-75 with CRF.

Mar 19 '20 01:03 edchengg

Hi, I read your code and found there are two problems that hinder the performance improvement. First, as I know, previous papers use head words of entity mentions as the candidate arguments, but you use the whole word sequence of entity mentions, which harms the argument-level performance a lot. Second, while training, you train the argument-level classifier based on predicted triggers, instead, I believe the argument-level classifier should be trained on the golden triggers.

For the 1st point, do you have references for head words? Thanks

Apr 17 '20 21:04 edchengg

bert-event-extraction bert-event-extraction copied to clipboard

Two approaches to improve the performance

bert-event-extraction
bert-event-extraction copied to clipboard