DocRED_Bert
DocRED_Bert copied to clipboard
A bert baseline for DocRED
DocRED_Bert
A bert baseline for DocRED (https://github.com/thunlp/DocRED)
Difference from the offical baseline (https://github.com/thunlp/DocRED)
-
Use the random undersampling to balance the sample scale of positive and negative (no relation) samples.
-
Only one relation will be contained in one sample.
Run
- Download dataset (https://github.com/thunlp/DocRED/tree/master/data)
Include rel2id.json
- Download and convert model weights
Weights in https://github.com/google-research/bert, model and convert in https://github.com/huggingface/pytorch-transformers,
- python preprocess.py
Please use 'convert_feature'.
Function 'convert_feature_multioutput' is used to build datasets as the offical baseline, but it seems that this strategy works badly in bert.
- python train_cls.py
Result in 10 epoch
Threshold=0.5 in sigmoid
All: Precision:46.336, Recall:78.334, F1-score:58.228
Ignore: Precision:42.544, Recall:76.520, F1-score:54.684
Threshold=0.9 in sigmoid
All: Precision:56.772, Recall:70.718, F1-score:62.981
Ignore: Precision:52.836, Recall:68.814, F1-score:59.776