Kyle Gao
Kyle Gao
[Pointer network](https://arxiv.org/pdf/1506.03134.pdf) and the models presented in [this paper](https://arxiv.org/abs/1511.06391) are useful models for combinatorial problems, e.g. reversing a sequence.
I sometimes notice that not using teacher forcing at all gives better results at inference time than using teacher forcing all the time. [This paper](https://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks.pdf) provides evidence for this behavior...
Benchmark with WMT machine translation dataset so that the performance of the library can be evaluated and compared with other implementations.
Researches have shown that adversarial loss is more effective than MLE training, consider developing an adversarial trainer. https://arxiv.org/abs/1704.06933 https://arxiv.org/abs/1703.04887
As configuring an experiment becomes more complicated with more features, it would be easier to read experiment configurations from a file and build the experiment.
This paper discussed and evaluated several regularization and optimization methods and gave the ablations on each techniques. It'd be interesting to experiment some techniques on seq2seq. https://arxiv.org/pdf/1708.02182.pdf