training_results_v0.6
training_results_v0.6 copied to clipboard
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3
Hi All,
Problem with dataset or code ? Thanks for any hints.
Run: training_results_v0.6/NVIDIA/benchmarks/gnmt/implementations/download_dataset.sh
Error:
Input sentences: 4562102 Output sentences: 4524868
Cleaning data/train.tok...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = "C.UTF-8",
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
clean-corpus.perl: processing data/train.tok.de & .en to data/train.tok.clean, cutoff 1-80, ratio 9
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000)..........(3400000)..........(3500000)..........(3600000)..........(3700000)..........(3800000)..........(3900000)..........(4000000)..........(4100000)..........(4200000)..........(4300000)..........(4400000)..........(4500000)......
Input sentences: 4562102 Output sentences: 4500966
Traceback (most recent call last):
File "pytorch/scripts/filter_dataset.py", line 79, in
Are these variables a part of your environment?
export LANG=C.UTF-8
export LC_ALL=C.UTF-8