Tacotron-2
Tacotron-2 copied to clipboard
`UnicodeEncodeError: 'ascii' codec can't encode character '\xe2'` in wavenet training
We are running the Tacotron-2 training script with LJ-speech dataset.
This is our dockerfile:
FROM tensorflow/tensorflow:latest-gpu-py3
RUN apt-get update
RUN apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools wget git vim
RUN pip install --upgrade pip
RUN wget http://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
RUN tar -jxvf LJSpeech-1.1.tar.bz2
RUN git clone https://github.com/Rayhane-mamah/Tacotron-2.git
RUN mv LJSpeech-1.1/ Tacotron-2/
WORKDIR Tacotron-2
RUN pip install -r requirements.txt
- OS: Ubuntu 18.04.1
- Python: 3.5.2
- Tensorflow: 1.9.0
- GPU: 1070
We successfully trained the tacotron model tacotron_model.ckpt-120000
, but got the synthesis error below:
Constructing model: Tacotron
Initialized Tacotron model. Dimensions (? = dynamic shape):
Train mode: False
Eval mode: False
GTA mode: True
Synthesis mode: True
embedding: (?, ?, 512)
enc conv out: (?, ?, 512)
encoder out: (?, ?, 512)
decoder out: (?, ?, 80)
residual out: (?, ?, 512)
projected residual out: (?, ?, 80)
mel out: (?, ?, 80)
<stop_token> out: (?, ?)
Loading checkpoint: logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt-120000
Loaded metadata for 13100 examples (23.23 hours)
starting synthesis
19%|########4 | 5/26 [00:45<03:13, 9.22s/it]Traceback (most recent call last):
File "train.py", line 133, in <module>
main()
File "train.py", line 127, in main
train(args, log_dir, hparams)
File "train.py", line 67, in train
input_path = tacotron_synthesize(args, hparams, checkpoint)
File "/notebooks/Tacotron-2/tacotron/synthesize.py", line 123, in tacotron_synthesize
return run_synthesis(args, checkpoint_path, output_dir, hparams)
File "/notebooks/Tacotron-2/tacotron/synthesize.py", line 107, in run_synthesis
file.write('|'.join([str(x) for x in elems]) + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\xe2' in position 224: ordinal not in range(128)
We also tried to export the LANG environment variable.
export LANG=en_US.UTF-8
But still get same error.
Would you mind to give us some advice?
Skip it using try catch. It will be temporary solution.
Well, I guess, file.write
wants you to write ascii symbols to it, so you can find where it was opened and tell him encoding='utf8'
We also tried to add utf-8 declaration in the top of synthesis.py.
# -*- coding: utf-8 -*-
Not working, either.
Thanks @gloriouskilka , we declare the utf-8 encoding for file open and it seems to work. :)
- with open(os.path.join(eval_dir, 'map.txt'), 'w') as file:
+ with open(os.path.join(eval_dir, 'map.txt'), 'w', encoding="utf-8") as file:
We also found some file open code without encoding declaration, so we made a PR.
This fixed for me thanks @a8568730