cmu-multinlp
cmu-multinlp copied to clipboard
Generalizing Natural Language Analysis through Span-relation Representations
CMU multinlp
This repository contains scripts to download and preprocess the General Language Analysis Datasets (GLAD) benchmark and codes for the task-agnostic SpanRel model, a multi-faceted NLP toolkit that can cover many different tasks.
Generalizing Natural Language Analysis through Span-relation Representations (ACL2020)
Prerequisites
# install jsonnet from https://github.com/google/jsonnet
conda create -n spanrel python=3.6
conda activate spanrel
./setup.sh
Datasets
8 datasets consisting of annotations of 10 tasks are included in this repository.
| Dataset | Task | Task code | Dir |
|---|---|---|---|
| Wet Lab Protocols | NER | wlp | data/wlp |
| RE | wlp | data/wlp | |
| CoNLL-2003 | NER | ner | data/semeval_2014/ |
| SemEval-2010 Task 8 | RE | rc | data/semeval_2010_task8/ |
| OntoNotes 5.0 | Coref. | coref | data/conll_coref_2012/ |
| SRL | srl | data/conll_srl_2012/ | |
| POS | pos_conll | data/conll_pos_2012/ | |
| Dep. | dp_conll | data/conll_dep_2012/ | |
| Consti. | consti_conll | data/conll_consti_2012/ | |
| Penn Treebank | POS | pos | data/ptb_pos/ |
| Dep. | dp | data/ptb/ | |
| Consti. | consti | data/ptb_consti/ | |
| OIE2016 | OpenIE | oie | data/openie/ |
| MPQA 3.0 | ORL | orl | data/mpqa/ |
| SemEval-2014 Task 4 | ABSA | semeval14_st2 | data/semeval_2014/ |
Follow the instructions in run.sh in each dataset directory to download and preprocess the datasets into BRAT format.
Train and Evaluate SpanRel models
Run BERT-based models, where $emb can be bert-base-uncased, bert-large-uncased, and $task is one of the "task code" shown in the table.
./run_by_config_bert.sh $task $emb $output
Run GloVe/ELMo-based models, where $emb can be glove or elmo, and $task is one of the "task code" shown in the table.
./run_by_config.sh $task $emb $output
Train and Evaluate SpanRel models on other datasets
- Put the data in
data/kairoswithtrain,dev, andtestsub-directory, each containing multiple.annand.txtfiles. - Build vocabulary:
./run_by_config_bert.sh kairos bert-base-uncased output/kairos_vocab. - Modify the
vocabvariable in thekairossection ofrun_by_config_bert.shtooutput/kairos_vocab/vocabulary. - Train and evaluate:
./run_by_config_bert.sh kairos bert-base-uncased output/kairos_log.