Subword-PWIM icon indicating copy to clipboard operation
Subword-PWIM copied to clipboard

Subword based Pairwise Word Interaction Model for Paraphrase Identification

Subword-PWIM

This repository contains code and data used in the following paper:

@inproceedings{lan2018subword,
  author     = {Lan, Wuwei and Xu, Wei},
  title      = {Character-based Neural Networks for Sentence Pair Modeling},
  booktitle  = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
  year       = {2018}
} 

The original PWIM is from this paper:

@inproceedings{he-lin:2016:N16-1,
  author     = {He, Hua  and  Lin, Jimmy},
  title      = {Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement},
  booktitle  = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
  year       = {2016}
} 

A few notes

  1. This repositiory only contains MSRP dataset, here is Twitter-URL here and PIT-2015.

  2. We follow this code to do data preprocessing.

  3. The model was implemented with PyTorch 0.4.0 and Torchtext 0.1.1 .

  4. Sample command to run: python main.py, you can check main.py to add more arguments.

  5. There is a demo you can try (download save_dir, which contains model trained on Twitter-URL with unigram CNN):

    python -W ignore demo.py 'do you know where my book is' 'i cannot find my book, do you know where is it'