Tacotron-2 icon indicating copy to clipboard operation
Tacotron-2 copied to clipboard

`UnicodeEncodeError: 'ascii' codec can't encode character '\xe2'` in wavenet training

Open a8568730 opened this issue 6 years ago • 5 comments

We are running the Tacotron-2 training script with LJ-speech dataset.

This is our dockerfile:

FROM tensorflow/tensorflow:latest-gpu-py3

RUN apt-get update
RUN apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools wget git vim
RUN pip install --upgrade pip

RUN wget http://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
RUN tar -jxvf LJSpeech-1.1.tar.bz2

RUN git clone https://github.com/Rayhane-mamah/Tacotron-2.git
RUN mv LJSpeech-1.1/ Tacotron-2/

WORKDIR Tacotron-2
RUN pip install -r requirements.txt
  • OS: Ubuntu 18.04.1
  • Python: 3.5.2
  • Tensorflow: 1.9.0
  • GPU: 1070

We successfully trained the tacotron model tacotron_model.ckpt-120000, but got the synthesis error below:

Constructing model: Tacotron
Initialized Tacotron model. Dimensions (? = dynamic shape):
  Train mode:               False
  Eval mode:                False
  GTA mode:                 True
  Synthesis mode:           True
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out:              (?, ?, 512)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
Loading checkpoint: logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt-120000
Loaded metadata for 13100 examples (23.23 hours)
starting synthesis
 19%|########4                                   | 5/26 [00:45<03:13,  9.22s/it]Traceback (most recent call last):
  File "train.py", line 133, in <module>
    main()
  File "train.py", line 127, in main
    train(args, log_dir, hparams)
  File "train.py", line 67, in train
    input_path = tacotron_synthesize(args, hparams, checkpoint)
  File "/notebooks/Tacotron-2/tacotron/synthesize.py", line 123, in tacotron_synthesize
    return run_synthesis(args, checkpoint_path, output_dir, hparams)
  File "/notebooks/Tacotron-2/tacotron/synthesize.py", line 107, in run_synthesis
    file.write('|'.join([str(x) for x in elems]) + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\xe2' in position 224: ordinal not in range(128)

We also tried to export the LANG environment variable.

export LANG=en_US.UTF-8

But still get same error.

Would you mind to give us some advice?

a8568730 avatar Aug 24 '18 15:08 a8568730

Skip it using try catch. It will be temporary solution.

Yeongtae avatar Aug 25 '18 07:08 Yeongtae

Well, I guess, file.write wants you to write ascii symbols to it, so you can find where it was opened and tell him encoding='utf8'

gloriouskilka avatar Aug 25 '18 10:08 gloriouskilka

We also tried to add utf-8 declaration in the top of synthesis.py.

# -*- coding: utf-8 -*-

Not working, either.

a8568730 avatar Aug 25 '18 13:08 a8568730

Thanks @gloriouskilka , we declare the utf-8 encoding for file open and it seems to work. :)

- with open(os.path.join(eval_dir, 'map.txt'), 'w') as file:
+ with open(os.path.join(eval_dir, 'map.txt'), 'w', encoding="utf-8") as file:

We also found some file open code without encoding declaration, so we made a PR.

a8568730 avatar Aug 29 '18 01:08 a8568730

This fixed for me thanks @a8568730

josemf avatar Feb 28 '20 14:02 josemf