dl-for-emo-tts Update to tensorflow 2 & numpy & others

to run the colab code successfully:

run the first cell only
go to /content/tacotron_pytorch/hparams.py and change it to this:

import tensorflow as tf
import types

# Default hyperparameters:
hparams_dict = {
    # Comma-separated list of cleaners to run on text prior to training and eval. For non-English
    # text, you may want to use "basic_cleaners" or "transliteration_cleaners" See TRAINING_DATA.md.
    'cleaners': 'english_cleaners',
    'use_cmudict': False,  # Use CMUDict during training to learn pronunciation of ARPAbet phonemes

    # Audio:
    'num_mels': 80,
    'num_freq': 1025,
    'sample_rate': 20000,
    'frame_length_ms': 50,
    'frame_shift_ms': 12.5,
    'preemphasis': 0.97,
    'min_level_db': -100,
    'ref_level_db': 20,

    # Model:
    # TODO: add more configurable hparams
    'outputs_per_step': 5,
    'padding_idx': None,
    'use_memory_mask': False,

    # Data loader
    'pin_memory': True,
    'num_workers': 2,

    # Training:
    'batch_size': 32,
    'adam_beta1': 0.9,
    'adam_beta2': 0.999,
    'initial_learning_rate': 0.002,
    'decay_learning_rate': True,
    'nepochs': 1000,
    'weight_decay': 0.0,
    'clip_thresh': 1.0,

    # Save
    'checkpoint_interval': 5000,

    # Eval:
    'max_iters': 200,
    'griffin_lim_iters': 60,
    'power': 1.5,              # Power to raise magnitudes to prior to Griffin-Lim
}
# Convert the dictionary to a namespace
hparams = types.SimpleNamespace(**hparams_dict)


def hparams_debug_string():
    hp = ['  %s: %s' % (name, hparams[name]) for name in sorted(hparams)]
    return 'Hyperparameters:\n' + '\n'.join(hp)

go to /content/tacotron_pytorch/lib/tacotron/util/audio.py and change np.complex in line 70 to complex
go to /content/pytorch-dc-tts/datasets/emovdb.py line 45 and change np.long to np.int64
go to /content/pytorch-dc-tts/audio.py line 61 and change it to

return librosa.istft(spectrogram, hop_length=hp.hop_length, win_length=hp.win_length, window="hann")

line 47 to

est = librosa.stft(X_t, n_fft=hp.n_fft, hop_length=hp.hop_length, win_length=hp.win_length)

remove %tensorflow_version 1.x from the second cell in the colab

now it works although Amused emotion is not sounding correct but i'll update this when i fix it

Apr 18 '24 10:04 YaraAlkaka

Just wanted to say a huge thanks for sharing this!

Jun 04 '24 08:06 ruobingli1103

Excellent more, don't forget to pip install docopt

Sep 24 '24 02:09 nguyenlamvu123

dl-for-emo-tts dl-for-emo-tts copied to clipboard

Update to tensorflow 2 & numpy & others

dl-for-emo-tts
dl-for-emo-tts copied to clipboard