nested-ner-tacl2020-flair icon indicating copy to clipboard operation
nested-ner-tacl2020-flair copied to clipboard

Implementation of Nested Named Entity Recognition using Flair

Implementation of Nested Named Entity Recognition

Some files are part of NeuroNLP2.

Requirements

We tested this library with the following libraries:

Running experiments

Testing this library with a sample data

  1. Put the embedding file PubMed-shuffle-win-2.bin into the "./embeddings/" directory
  2. Run the gen_data.py to generate the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data.py
    
  3. Run the train.py to start training
    python train.py
    

Reproducing our experiment on the ACE-2004 dataset

  1. Put the corpus ACE-2004 into the "../ACE2004/" directory
  2. Put this .tgz file into the "../" and extract it
  3. Run the parse_ace2004.py to extract sentences for training, and they will be placed at the "./data/ace2004/"
    python parse_ace2004.py
    
  4. Put the embedding file GoogleNews-vectors-negative300.bin.gz into the "./embeddings/" directory
  5. Decompress the embedding file GoogleNews-vectors-negative300.bin.gz
    gzip -d embeddings/GoogleNews-vectors-negative300.bin.gz
    
  6. Run the gen_data_for_ace2004.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_ace2004.py
    
  7. Run the train.py to start training
    python train.py
    

Reproducing our experiment on the ACE-2005 dataset

  1. Put the corpus ACE-2005 into the "../ACE2005/" directory
  2. Put this .tgz file into the "../" and extract it
  3. Run the parse_ace2005.py to extract sentences for training, and they will be placed at the "./data/ace2005/"
    python parse_ace2005.py
    
  4. Put the embedding file GoogleNews-vectors-negative300.bin.gz into the "./embeddings/" directory
  5. Decompress the embedding file GoogleNews-vectors-negative300.bin.gz
    gzip -d embeddings/GoogleNews-vectors-negative300.bin.gz
    
  6. Run the gen_data_for_ace2005.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_ace2005.py
    
  7. Run the train.py to start training
    python train.py
    

Reproducing our experiment on the GENIA dataset

  1. Put the corpus GENIA into the "../GENIA/" directory
  2. Run the parse_genia.py to extract sentences for training, and they will be placed at the "./data/genia/"
    python parse_genia.py
    
  3. Put the embedding file PubMed-shuffle-win-2.bin into the "./embeddings/" directory
  4. Run the gen_data_for_genia.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_genia.py
    
  5. Run the train.py to start training
    python train.py
    

Configuration

Configurations of the model and training are in config.py

Citation

Please cite our paper:

@article{shibuya-hovy-2020-nested,
  title = "Nested Named Entity Recognition via Second-best Sequence Learning and Decoding",
  author = "Shibuya, Takashi and Hovy, Eduard",
  journal = "Transactions of the Association for Computational Linguistics",
  volume = "8",
  year = "2020",
  doi = "10.1162/tacl_a_00334",
  pages = "605--620",
}