scibert icon indicating copy to clipboard operation
scibert copied to clipboard

Reproducability issue with ChemProt results for relation extraction using SciBERT

Open pg427 opened this issue 3 years ago • 3 comments

Hello, I'm currently in midst of replicating the results for the relation extraction task on the ChemProt dataset using SciBERT but so far have been unsuccessful in achieving the F1 score as mentioned in your paper. Using the hyperparameters as described in your paper, I've been able to get an F1 score of 0.51 (Test Set) as detailed by the results from running the script provided in your codebase. Please advise if any further hyperparameter tuning or other tweaking is required to reproduce the F1 score as mentioned in your paper. Thanks in advance!

pg427 avatar Apr 08 '21 11:04 pg427

Hello, I encountered the same issue with the F1 score. @pg427 Did you fix it ?

laleye avatar Apr 29 '21 12:04 laleye

@laleye No. I'm still awaiting a reply from the authors about this. I tried hyperparameter tuning but this is the maximum score I've gotten so far.

pg427 avatar Apr 29 '21 14:04 pg427

Sorry for missing this. I believe the issue you're seeing is a metric mismatch. For the Chemprot result, the standard metric is micro-F1 (which is computationally equivalent to accuracy) not macro-F1. In our experiments, our macro-F1 was also around 0.5, while micro-F1 is the number reported in the paper. We reference this in Table 1 caption:

Keeping with past work, we report macro F1 scores for NER (span-level), macro F1 scores for REL and CLS (sentence-level), and macro F1 for PICO (token-level), and micro F1 for ChemProt specifically.

See this screenshot of some experimental results (right-most column is the macro-F1 result you're seeing): image

Hope that helps!

kyleclo avatar Apr 29 '21 16:04 kyleclo