NLL-IE
NLL-IE copied to clipboard
Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021
NLL-IE
Code for EMNLP 2021 paper Learning from Noisy Labels for Entity-Centric Information Extraction.
Requirements
- PyTorch >= 1.8.1
- Transformers >= 3.4.0
- wandb
- ujson
- tqdm
- truecase
- seqeval
Dataset
The TACRED dataset can be obtained from this link. The TACREV dataset can be obtained following the instructions in tacrev. The original CoNLL dataset can be obtained from this link. The revised CoNLL test dataset can be obtained from this link. The expected structure of files is:
NLL-IE
|-- re
| |-- data
| | |-- train.json
| | |-- dev.json
| | |-- test.json
| | |-- dev_rev.json
| | |-- test_rev.json
|-- ner
| |-- data
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- conllpp_test.txt
Training and Evaluation
Train the RE/NER model on with the following command:
>> python train.py
The training loss and evaluation results on the dev set are synced to the wandb dashboard.