subword-nmt icon indicating copy to clipboard operation
subword-nmt copied to clipboard

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

Results 3 subword-nmt issues
Sort by recently updated
recently updated
newest added

I am running with a very big file: about 150M lines, disk size 60GB, --num-workers 10, and then : 'vocab += pickle.load(f)' in learn_bpe.py will report error: EOFError: Ran out...

I am running the gnmt pytorch from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/GNMT, when I run ```python python3 translate.py --model /workspace/autoFL/nvidia_gnmt_torch/nvidia_gnmtpyt_fp32_20190806.pth --input /workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.en --reference /workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.de --output /tmp/output --math fp32 --batch-size 128 --beam-size 1 2 5...

Hi, here is the case. 1. I pretrained a language model on English-only corpus, using BPE tokenization with vocab_size=32000. 2. I want to continue training the model on Japanese corpus....