nested-ner-tacl2020-flair
nested-ner-tacl2020-flair copied to clipboard
Implementation of Nested Named Entity Recognition using Flair
Implementation of Nested Named Entity Recognition
Some files are part of NeuroNLP2.
Requirements
We tested this library with the following libraries:
- Python (3.7)
- PyTorch (1.10.0)
- Numpy (1.17.3)
- StanfordNLP (0.2.0) for accessing the Java Stanford CoreNLP Server (3.9.2)
- Flair (0.9)
Running experiments
Testing this library with a sample data
- Put the embedding file PubMed-shuffle-win-2.bin into the "./embeddings/" directory
- Run the gen_data.py to generate the processed data files for training, and they will be placed at the "./data/" directory
python gen_data.py
- Run the train.py to start training
python train.py
Reproducing our experiment on the ACE-2004 dataset
- Put the corpus ACE-2004 into the "../ACE2004/" directory
- Put this .tgz file into the "../" and extract it
- Run the parse_ace2004.py to extract sentences for training, and they will be placed at the "./data/ace2004/"
python parse_ace2004.py
- Put the embedding file GoogleNews-vectors-negative300.bin.gz into the "./embeddings/" directory
- Decompress the embedding file GoogleNews-vectors-negative300.bin.gz
gzip -d embeddings/GoogleNews-vectors-negative300.bin.gz
- Run the gen_data_for_ace2004.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
python gen_data_for_ace2004.py
- Run the train.py to start training
python train.py
Reproducing our experiment on the ACE-2005 dataset
- Put the corpus ACE-2005 into the "../ACE2005/" directory
- Put this .tgz file into the "../" and extract it
- Run the parse_ace2005.py to extract sentences for training, and they will be placed at the "./data/ace2005/"
python parse_ace2005.py
- Put the embedding file GoogleNews-vectors-negative300.bin.gz into the "./embeddings/" directory
- Decompress the embedding file GoogleNews-vectors-negative300.bin.gz
gzip -d embeddings/GoogleNews-vectors-negative300.bin.gz
- Run the gen_data_for_ace2005.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
python gen_data_for_ace2005.py
- Run the train.py to start training
python train.py
Reproducing our experiment on the GENIA dataset
- Put the corpus GENIA into the "../GENIA/" directory
- Run the parse_genia.py to extract sentences for training, and they will be placed at the "./data/genia/"
python parse_genia.py
- Put the embedding file PubMed-shuffle-win-2.bin into the "./embeddings/" directory
- Run the gen_data_for_genia.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
python gen_data_for_genia.py
- Run the train.py to start training
python train.py
Configuration
Configurations of the model and training are in config.py
Citation
Please cite our paper:
@article{shibuya-hovy-2020-nested,
title = "Nested Named Entity Recognition via Second-best Sequence Learning and Decoding",
author = "Shibuya, Takashi and Hovy, Eduard",
journal = "Transactions of the Association for Computational Linguistics",
volume = "8",
year = "2020",
doi = "10.1162/tacl_a_00334",
pages = "605--620",
}