canard icon indicating copy to clipboard operation
canard copied to clipboard

Repo for the question-in-context rewriting baseline presented in Elgohary et al. "Can you unpack that? Learning to rewrite questions-in-context", EMNLP 2019.

CANARD Rewriting Models

The repo is used to maintain scripts for training models for the question-in-context rewriting task introduced in

Ahmed Elgohary, Denis Peskov, Jordan Boyd-Graber. 2019. Can you unpack that? Learning to rewrite questions-in-context. In Empirical Methods in Natural Language Processing.

The CANARD dataset can be downloaded from the dataset page.

Pointer-generator sequence-to-sequence model

To run the model:

  1. Install Spacy.
  2. Clone and install OpenNMT-py.
  3. Download GloVE 840B.300d embeddings.
  4. Run ./preprocess.sh to generate sequence-to-sequence format of the dataset.
  5. Run ./ONMT_Pipeline_GloVE.sh to train and evaluate the model.

A trained model can be downloaded using this link. The model achieves a 51.54 BLEU score on the dev set and 50.00 on the test set.