Robin issues

Results 6 issues of


                                            Robin

[Non-autoregressive Transformer] Add GLAT, CTC, DS

This PR adds the code for the following methods to the Non-Autoregressive Transformer: - Glancing Transformer (GLAT) from "[Glancing Transformer for Non-Autoregressive Neural Machine Translation](https://aclanthology.org/2021.acl-long.155.pdf)" (Qian et al., 2021) -...

CLA Signed

missing `repeat` function

Where does `repeat` in [this line](https://github.com/lucidrains/n-grammer-pytorch/blob/df514e05aeb713a139ccc1e42bea9b6e6c3f825f/n_grammer_pytorch/n_grammer_pytorch.py#L176) come from? I'm guessing there is an import missing from `einops`.

Missing fairseq code-diff

The `transformer_layer.py` also has changes regarding the `ret_ffn_inp` flag that is introduced [here](https://github.com/ShannonAI/fast-knn-nmt/blob/main/thirdparty/fairseq/fairseq/modules/transformer_layer.py#L290) which aren't listed in the readme.

`batch_size` does not work across lines

The current implementation does not properly use `batch_size` across lines in a file since every line is processed individually: https://github.com/rewicks/ersatz/blob/e5ed3ebbc64ac5993093ee42bca3a282d45e556e/ersatz/split.py#L169 This means that if we have a file that contains...

Add option to write reversible sequence mapping

Often, users may want to apply this sentence segmenter within larger pipelines, e.g. one use case is segmenting document-level data into sentence-level segments that can be easily translated by sentence-level...

Wrong language name for `zh_TW` (Taiwanese)

The language name for Taiwanese (`zh_TW`) seems to be wrongly given as `Chinese` and also the regular locale creation doesnt work as expected: ```python >>> Locale('zh_TW').get_language_name('en_US') Traceback (most recent call...