PhenoBERT
PhenoBERT copied to clipboard
The eval results are inconsistent with the paper
I download models and embeddings from given google drive link, and run annotate.py (using default args p1=0.8, p2=0.6, p3=0.9) and eval_all.py to calculate metrics. Here are the results: GSC+: micro-precision 0.8052, micro-recall 0.5592, macro-precision 0.7754, macro-recall 0.6069 ID-68: micro-precision 0.9301, micro-recall 0.6825, macro-precision 0.9345, macro-recall 0.6853
which are inconsistent with the results in the paper:
GSC+: micro-precision 0.8011, micro-recall 0.6698, macro-precision 0.7910, macro-recall 0.7083 ID-68: micro-precision 0.9427, micro-recall 0.7812, macro-precision 0.9491, macro-recall 0.7756
What could be the reason for this?