Stig-Arne Grönroos issues

Results 12 issues of


                                            Stig-Arne Grönroos

Bug in handling of gzipped input files

The command line help indicates that gzipped input files are supported. However, if a gzipped training data file or validation data file is given, training fails with UnicodeDecodeError. > File...

bug

Translation server and demo frontend

The existing translation server from OpenNMT-py was refurbished. A demo frontend was implemented using streamlit.

Shared embeddings without a shared subword vocabulary

Currently it is possible to do either of these: 1. Use a language-specific sentencepiece model and subword vocabulary, together with a language-specific embedding matrix. This is the default usage. 2....

enhancement

Coverage vector

Implement coverage in the attention mechanism, following [1]. [1] Tu, Zhaopeng, et al. "Coverage-based Neural Machine Translation." arXiv preprint arXiv:1601.04811 (2016). http://arxiv.org/pdf/1601.04811

enhancement

Convolutional character-level encoder without explicit segmentation.

Implement an encoder following [1]. [1] Lee, Jason, Kyunghyun Cho, and Thomas Hofmann. "Fully Character-Level Neural Machine Translation without Explicit Segmentation." arXiv preprint arXiv:1610.03017 (2016). https://arxiv.org/pdf/1610.03017

enhancement

Minimum Risk Training

Implement Minimum Risk Training (MRT), following [1]. [1] Shen, Shiqi, et al. "Minimum risk training for neural machine translation." arXiv preprint arXiv:1512.02433 (2015). http://arxiv.org/pdf/1512.02433

enhancement

Hybrid word-character decoder

Implement a two-level decoder following [1]. First a word-level decoder produces a sequence that may contain ```` symbols. Next, these ```` symbols are filled in using a character-level decoder. [1]...

enhancement

opus_read fails to extract CCMatrix

I tried to extract the aligned sentence pairs from CCMatrix, previously downloaded using `opus_express`. The command I used was ``` opus_read --source en --target fi --directory CCMatrix --preprocess xml --leave_non_alignments_out...

When performing GPU assignment, keep track of split corpora

The LP tuple has been extended to include the offset. This avoids a bug where some splits of corpora were not assigned.

Format versioning for model artifacts

A version number should be stored in the model save files (checkpoints). Whenever a code change affects the format of the save files, the version should be incremented. When loading...

enhancement

good first issue