bert-event-extraction icon indicating copy to clipboard operation
bert-event-extraction copied to clipboard

Two approaches to improve the performance

Open Life-0-1 opened this issue 4 years ago • 6 comments

Hi, I read your code and found there are two problems that hinder the performance improvement. First, as I know, previous papers use head words of entity mentions as the candidate arguments, but you use the whole word sequence of entity mentions, which harms the argument-level performance a lot. Second, while training, you train the argument-level classifier based on predicted triggers, instead, I believe the argument-level classifier should be trained on the golden triggers.

Life-0-1 avatar Mar 16 '20 02:03 Life-0-1

I further read your evaluation scripts, and found that your evaluation metrics are too strict. In your code, if an argument is correctly classified, its corresponding predicted trigger should also be correctly classified, however, it's too strict and is not adopted in previous papers. There is the way they adopted to evaluate the argument-level performance.

  • A trigger is correct if its event subtype and offsets match those of a reference trigger.
  • An argument is correctly identified if its event subtype and offsets match those of any of the reference argument mentions.
  • An argument is correctly identified and classified if its event subtype, offsets and argument role match those of any of the reference argument mentions.

You could read this paper for more information, "Joint Event Extraction via Structured Prediction with Global Features".

Life-0-1 avatar Mar 16 '20 09:03 Life-0-1

So (i, t_start, t_end, t_type_str, a_start, a_end, a_type_idx) should be (i, t_type_str, a_start, a_end, a_type_idx) instead

edchengg avatar Mar 18 '20 17:03 edchengg

I also got F1 for trigger classification around 69 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem?

edchengg avatar Mar 18 '20 18:03 edchengg

I also got F1 for trigger classification around 66 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem? Hi, I have read the two papers before, and I have get the f1=74~75 for trigger-classification task by use Pretrained-LM+CRF. And I don't use any auxiliary information such as NER and pos imformation, my result is close to the paper 'Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources'. There is my code: https://github.com/Hanlard/Transformer-based-pretrained-model-for-event-extraction, which is jointly extracting the trigger and arguments. The f1-value could be higher if you only train it for trigger-classification with the pre-trained model 'XLM-Roberta'. Although the authors of the paper 'Exploring Pre-trained Language Models for Event Extraction and Generation' are from the same school (NUDT) with me, I don't think this paper could be reproduction.

Hanlard avatar Mar 19 '20 00:03 Hanlard

I also got F1 for trigger classification around 66 with BERT + linear classification layer. But this is way below results reported from paper (https://www.aclweb.org/anthology/P19-1522.pdf, https://www.aclweb.org/anthology/K19-1061.pdf). They got F1 ranges from 73 - 80. I don't think a CRF will help a lot. Anyone experience the same problem? Hi, I have read the two papers before, and I have get the f1=74~75 for trigger-classification task by use Pretrained-LM+CRF. And I don't use any auxiliary information such as NER and pos imformation, my result is close to the paper 'Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources'. There is my code: https://github.com/Hanlard/Transformer-based-pretrained-model-for-event-extraction, which is jointly extracting the trigger and arguments. The f1-value could be higher if you only train it for trigger-classification with the pre-trained model 'XLM-Roberta'. Although the authors of the paper 'Exploring Pre-trained Language Models for Event Extraction and Generation' are from the same school (NUDT) with me, I don't think this paper could be reproduction.

Thanks for you reply and code! I will take a look ASAP. Its good to know someone get 74-75 with CRF.

edchengg avatar Mar 19 '20 01:03 edchengg

Hi, I read your code and found there are two problems that hinder the performance improvement. First, as I know, previous papers use head words of entity mentions as the candidate arguments, but you use the whole word sequence of entity mentions, which harms the argument-level performance a lot. Second, while training, you train the argument-level classifier based on predicted triggers, instead, I believe the argument-level classifier should be trained on the golden triggers.

For the 1st point, do you have references for head words? Thanks

edchengg avatar Apr 17 '20 21:04 edchengg