NLP-progress
NLP-progress copied to clipboard
English information extraction has incorrect F1 scores
Hello!
Short description
The information_extraction page has illogical values for F1 scores.
Assumptions
This issue assumes that the F1 score is computed as:
F1 = 2 * (P * R)
-------
(P + R)
Beyond that, for the rest of this issue I'm assuming that the precision and recall are correct, and I use these values to compute my own F1 score. Apologies if these assumptions are incorrect.
Detailed concerns
- For the Base dataset, both papers are listed with an F1 score which is higher than the precision or recall. I believe this to be non-sensical, even if a different alpha or beta is used in the F-measure computation.
- For the Ambiguous dataset, both papers their F1 score skews higher than expected. (79.3 instead of 74.7 and 91.9 instead of 77.1)
- For the ReVerb45k dataset, the CESI paper skews higher again (81.9 instead of 71.9)
- For the ReVerb45k dataset, the Galárraga et al. paper shows an F1 of 0.5, while the precision and recall are 71.6 and 50.8, respectively. I would have expected an F1 of 59.4, not 0.5.
In short, these results do not seem correct. Whether the issue is with the F1-score computation, or the precision/recall is not something I know.
- Tom Aarsen