bluebert
bluebert copied to clipboard
Elmo results inconsistent
When I run the Elmo code multiple times on the same data, results vary significantly and surpass the results reported in the literature. What am I doing wrong?
The script I'm running:
python3 elmoft.py \
--task bc5cdr-chem \
--seq2vec boe \
--options_path /path/to/options.json \
--weights_path /path/to/weights.hdf5 \
--maxlen 128 \
--fchdim 500 \
--lr 0.001 \
--pdrop 0.5 \
--do_norm \
--norm_type batch \
--do_lastdrop \
--initln \
--earlystop \
--epochs 20 \
--bsize 64 \
--data_dir /path/to/data
Pre-trained model weights.hdf5
and options.json
were downloaded from:
ELMo PubMed AllenNLP
The code outputs the following F1 score for task bc5cdr-chem
(Literature report numbers around 91.5% for elmo)
accuracy: 0.9943132108
macro avg: 0.9489234576
weighted avg: 0.9941723561
The code outputs the following F1 score for task bc5cdr-dz
(Literature report numbers around 83.9% for elmo)
accuracy: 0.988988989
macro avg: 0.909805591
weighted avg: 0.9888870565
The datasets were downloaded from: bert_data.zip And two additional columns were added, so that the labels are in the column that the code expects.
Am I doing something wrong? Or is it a bug in the implementation?