coqa-baselines icon indicating copy to clipboard operation
coqa-baselines copied to clipboard

The baselines used in the CoQA paper

coqa-baselines

We provide several baselines: conversational models, extractive reading comprehension models and their combined models for the CoQA challenge. See more details in the paper. We also provide instructions on how to run pretrained models on Codalab -- our platform for evaluation on the test set.

As we use the OpenNMT-py library for all our seq2seq experiments, please use the following command to clone our repository.

  git clone --recurse-submodules [email protected]:stanfordnlp/coqa-baselines.git

This code repository was mostly written by Danqi Chen, built on top of the DrQA and OpenNMT-py projects, with some help from Shayne Longpre and Siva Reddy. If you have any questions about this repository, please use Github Issues.

Requirements

torch>=0.4.0
torchtext==0.2.1
gensim
pycorenlp

Download

Download the dataset:

  mkdir data
  wget -P data https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json
  wget -P data https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json

Download pre-trained word vectors:

  mkdir wordvecs
  wget -P wordvecs http://nlp.stanford.edu/data/wordvecs/glove.42B.300d.zip
  unzip -d wordvecs wordvecs/glove.42B.300d.zip
  wget -P wordvecs http://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip
  unzip -d wordvecs wordvecs/glove.840B.300d.zip

Start a CoreNLP server

  mkdir lib
  wget -P lib http://central.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.9.1/stanford-corenlp-3.9.1.jar
  java -mx4g -cp lib/stanford-corenlp-3.9.1.jar edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

Conversational models

Preprocessing

Generate the input files for seq2seq models --- needs to start a CoreNLP server (n_history can be changed to {0, 1, 2, ..} or -1):

  python scripts/gen_seq2seq_data.py --data_file data/coqa-train-v1.0.json --n_history 2 --lower --output_file data/seq2seq-train-h2
  python scripts/gen_seq2seq_data.py --data_file data/coqa-dev-v1.0.json --n_history 2 --lower --output_file data/seq2seq-dev-h2

Preprocess the data and embeddings:

  python seq2seq/preprocess.py -train_src data/seq2seq-train-h2-src.txt -train_tgt data/seq2seq-train-h2-tgt.txt -valid_src data/seq2seq-dev-h2-src.txt -valid_tgt data/seq2seq-dev-h2-tgt.txt -save_data data/seq2seq-h2 -lower -dynamic_dict -src_seq_length 10000
  PYTHONPATH=seq2seq python seq2seq/tools/embeddings_to_torch.py -emb_file_enc wordvecs/glove.42B.300d.txt -emb_file_dec wordvecs/glove.42B.300d.txt -dict_file data/seq2seq-h2.vocab.pt -output_file data/seq2seq-h2.embed

Training

Run a seq2seq (with attention) model:

   python seq2seq/train.py -data data/seq2seq-h2 -save_model seq2seq_models/seq2seq -word_vec_size 300 -pre_word_vecs_enc data/seq2seq-h2.embed.enc.pt -pre_word_vecs_dec data/seq2seq-h2.embed.dec.pt -epochs 50 -gpuid 0 -seed 123

Run a seq2seq+copy model:

   python seq2seq/train.py -data data/seq2seq-h2 -save_model seq2seq_models/seq2seq_copy -copy_attn -reuse_copy_attn -word_vec_size 300 -pre_word_vecs_enc data/seq2seq.embed.enc.pt -pre_word_vecs_dec data/seq2seq.embed.dec.pt -epochs 50 -gpuid 0 -seed 123

Testing

  python seq2seq/translate.py -model seq2seq_models/seq2seq_copy_acc_65.49_ppl_4.71_e15.pt -src data/seq2seq-dev-h2-src.txt -output seq2seq_models/pred.txt -replace_unk -verbose -gpu 0
  python scripts/gen_seq2seq_output.py --data_file data/coqa-dev-v1.0.json --pred_file seq2seq_models/pred.txt --output_file seq2seq_models/seq2seq_copy.prediction.json

Reading comprehension models

Preprocessing

Generate the input files for the reading comprehension (extractive question answering) model -- needs to start a CoreNLP server:

  python scripts/gen_drqa_data.py --data_file data/coqa-train-v1.0.json --output_file coqa.train.json
  python scripts/gen_drqa_data.py --data_file data/coqa-dev-v1.0.json --output_file coqa.dev.json

Training

n_history can be changed to {0, 1, 2, ..} or -1.

  python rc/main.py --trainset data/coqa.train.json --devset data/coqa.dev.json --n_history 2 --dir rc_models --embed_file wordvecs/glove.840B.300d.txt

Testing

  python rc/main.py --testset data/coqa.dev.json --n_history 2 --pretrained rc_models

The pipeline model

Preprocessing

  python scripts/gen_pipeline_data.py --data_file data/coqa-train-v1.0.json --output_file1 data/coqa.train.pipeline.json --output_file2 data/seq2seq-train-pipeline
  python scripts/gen_pipeline_data.py --data_file data/coqa-dev-v1.0.json --output_file1 data/coqa.dev.pipeline.json --output_file2 data/seq2seq-dev-pipeline
  python seq2seq/preprocess.py -train_src data/seq2seq-train-pipeline-src.txt -train_tgt data/seq2seq-train-pipeline-tgt.txt -valid_src data/seq2seq-dev-pipeline-src.txt -valid_tgt data/seq2seq-dev-pipeline-tgt.txt -save_data data/seq2seq-pipeline -lower -dynamic_dict -src_seq_length 10000
  PYTHONPATH=seq2seq python seq2seq/tools/embeddings_to_torch.py -emb_file_enc wordvecs/glove.42B.300d.txt -emb_file_dec wordvecs/glove.42B.300d.txt -dict_file data/seq2seq-pipeline.vocab.pt -output_file data/seq2seq-pipeline.embed

Training

n_history can be changed to {0, 1, 2, ..} or -1.

  python rc/main.py --trainset data/coqa.train.pipeline.json --devset data/coqa.dev.pipeline.json --n_history 2 --dir pipeline_models --embed_file wordvecs/glove.840B.300d.txt --predict_raw_text n
  python seq2seq/train.py -data data/seq2seq-pipeline -save_model pipeline_models/seq2seq_copy -copy_attn -reuse_copy_attn -word_vec_size 300 -pre_word_vecs_enc data/seq2seq-pipeline.embed.enc.pt -pre_word_vecs_dec data/seq2seq-pipeline.embed.dec.pt -epochs 50 -gpuid 0 -seed 123

Testing

  python rc/main.py --testset data/coqa.dev.pipeline.json --n_history 2 --pretrained pipeline_models
  python scripts/gen_pipeline_for_seq2seq.py --data_file data/coqa.dev.pipeline.json --output_file pipeline_models/pipeline-seq2seq-src.txt --pred_file pipeline_models/predictions.json
  python seq2seq/translate.py -model pipeline_models/seq2seq_copy_acc_85.00_ppl_2.18_e16.pt -src pipeline_models/pipeline-seq2seq-src.txt -output pipeline_models/pred.txt -replace_unk -verbose -gpu 0
  python scripts/gen_seq2seq_output.py --data_file data/coqa-dev-v1.0.json --pred_file pipeline_models/pred.txt --output_file pipeline_models/pipeline.prediction.json

Results

All the results are based on n_history = 2:

Model Dev F1 Dev EM
seq2seq 20.9 17.7
seq2seq_copy 45.2 38.0
DrQA 55.6 46.2
pipeline 65.0 54.9

Citation

    @article{reddy2019coqa,
      title={{CoQA}: A Conversational Question Answering Challenge},
      author={Reddy, Siva and Chen, Danqi and Manning, Christopher D},
      journal={Transactions of the Association of Computational Linguistics (TACL)},
      year={2019}
    }

License

MIT