scibert icon indicating copy to clipboard operation
scibert copied to clipboard

[deprecated - too many problems w dataset] Kylel/semeval2017

Open kyleclo opened this issue 6 years ago • 4 comments

just the NER portion

kyleclo avatar Feb 18 '19 18:02 kyleclo

I have trained the model with the semeval2017 data and evaluated it afterwards with semeval2017 evaluation script. I'am confused why I get so different results: "best_validation_f1-measure-overall": 0.5288376220052742 "test_f1-measure-overall": 0.4320540671010848,

Results reported by the semeval2017 evaluation script on test set: precision recall f1-score support

Material 0.44 0.43 0.44 904 Process 0.41 0.36 0.38 954 Task 0.17 0.15 0.16 193

avg / total 0.40 0.37 0.39 2051

So F1 is 0.39.

Do you have maybe a reasonable explanation for this difference?

arthurbra avatar Apr 04 '19 18:04 arthurbra

hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the original semeval2017 task.

specifically, the 3 tasks in semeval2017 are (1) entity identification, (2) entity type classification, and (3) relation extraction. what we've implemented here combines both (1) and (2) into a single task (we're both extraction & tagging w/ the type at the same time). this will affect how f1 scoring is performed.

instead of using the sequence tagging model we have here, my plan is to adapt an existing model made for the semeval2017 task, and substitute in the various bert variants to replace glove/w2v embeddings.

kyleclo avatar Apr 04 '19 19:04 kyleclo

Hey @kyleclo thanks for your reply and explanation. I am really impressed by your results and want to learn more.

It is right, you perform entity identification and classification in one task. In my understanding this is task B in semeval 2017: Subtask B: t_B = O,M, P, T for tokens being out- side a keyphrase, or being part of a material, pro- cess or task.

In calculateMeasures of semeval2017 evaluation script you can pass the parameter remove_anno="rel" which should ignore relations during evaluation (as far as I understand the code). I already used this parameter in the evaluation of my prev. post. So I assume there should be an another explanation.

It would be great if you could apply the code of the semeval 2017 winner with BERT. Unfortunatelly I was not able to find it (AI2 system of Ammar et. al.).

arthurbra avatar Apr 06 '19 07:04 arthurbra

Hi Kyle,

I think I have found the problem: in the prediction script model.eval() is not called, so that dropout is active during prediction.

Regards Arthur

arthurbra avatar Oct 19 '19 17:10 arthurbra