Stig-Arne Grönroos
Stig-Arne Grönroos
The command line help indicates that gzipped input files are supported. However, if a gzipped training data file or validation data file is given, training fails with UnicodeDecodeError. > File...
The existing translation server from OpenNMT-py was refurbished. A demo frontend was implemented using streamlit.
Currently it is possible to do either of these: 1. Use a language-specific sentencepiece model and subword vocabulary, together with a language-specific embedding matrix. This is the default usage. 2....
Implement coverage in the attention mechanism, following [1]. [1] Tu, Zhaopeng, et al. "Coverage-based Neural Machine Translation." arXiv preprint arXiv:1601.04811 (2016). http://arxiv.org/pdf/1601.04811
Implement an encoder following [1]. [1] Lee, Jason, Kyunghyun Cho, and Thomas Hofmann. "Fully Character-Level Neural Machine Translation without Explicit Segmentation." arXiv preprint arXiv:1610.03017 (2016). https://arxiv.org/pdf/1610.03017
Implement Minimum Risk Training (MRT), following [1]. [1] Shen, Shiqi, et al. "Minimum risk training for neural machine translation." arXiv preprint arXiv:1512.02433 (2015). http://arxiv.org/pdf/1512.02433
Implement a two-level decoder following [1]. First a word-level decoder produces a sequence that may contain ```` symbols. Next, these ```` symbols are filled in using a character-level decoder. [1]...
I tried to extract the aligned sentence pairs from CCMatrix, previously downloaded using `opus_express`. The command I used was ``` opus_read --source en --target fi --directory CCMatrix --preprocess xml --leave_non_alignments_out...
The LP tuple has been extended to include the offset. This avoids a bug where some splits of corpora were not assigned.
A version number should be stored in the model save files (checkpoints). Whenever a code change affects the format of the save files, the version should be incremented. When loading...