NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

cannot train new model with TTS tacotron2

Open Lanzik opened this issue 2 years ago • 3 comments

I want to train new model with my dataset for other language

one case in my dataset json file is:

{"audio_filepath": "./wav/4.wav", "duration": 5.6750625, "is_phoneme": 1, "original_text": "gAme sevvOm d%r b/rnamerizI | $enAxte moqe@iyy%te fe@lIye hozE @%st | ", "normalized_text": "gAme sevvOm d%r b/rnamerizI | $enAxte moqe@iyy%te fe@lIye hozE @%st | "}

and my confige file is:

name: Tacotron2

train_dataset: ??? validation_datasets: ??? sup_data_path: null sup_data_types: null

phoneme_dict_path: "scripts/tts_dataset_files/cmudict-0.7b_nv22.01" heteronyms_path: "scripts/tts_dataset_files/heteronyms-030921" whitelist_path: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv"

model: pitch_fmin: 65.40639132514966 pitch_fmax: 2093.004522404789

sample_rate: 22050 n_mel_channels: 80 n_window_size: 1024 n_window_stride: 256 n_fft: 1024 lowfreq: 0 highfreq: 8000 window: hann pad_value: -11.52

text_normalizer: target: nemo_text_processing.text_normalization.normalize.Normalizer lang: en input_case: cased whitelist: ${whitelist_path}

text_normalizer_call_kwargs: verbose: false punct_pre_process: true punct_post_process: true

text_tokenizer: target: nemo.collections.tts.torch.tts_tokenizers.EnglishPhonemesTokenizer punct: true stresses: true chars: true apostrophe: true pad_with_space: true g2p: target: nemo.collections.tts.torch.g2ps.EnglishG2p phoneme_dict: ${phoneme_dict_path} heteronyms: ${heteronyms_path}

train_ds: dataset: target: "nemo.collections.tts.torch.data.TTSDataset" manifest_filepath: ${train_dataset} sample_rate: ${model.sample_rate} sup_data_path: ${sup_data_path} sup_data_types: ${sup_data_types} n_fft: ${model.n_fft} win_length: ${model.n_window_size} hop_length: ${model.n_window_stride} window: ${model.window} n_mels: ${model.n_mel_channels} lowfreq: ${model.lowfreq} highfreq: ${model.highfreq} max_duration: null min_duration: 0.1 ignore_file: null trim: False pitch_fmin: ${model.pitch_fmin} pitch_fmax: ${model.pitch_fmax} dataloader_params: drop_last: false shuffle: true batch_size: 48 num_workers: 4 pin_memory: true

validation_ds: dataset: target: "nemo.collections.tts.torch.data.TTSDataset" manifest_filepath: ${train_dataset} sample_rate: ${model.sample_rate} sup_data_path: ${sup_data_path} sup_data_types: ${sup_data_types} n_fft: ${model.n_fft} win_length: ${model.n_window_size} hop_length: ${model.n_window_stride} window: ${model.window} n_mels: ${model.n_mel_channels} lowfreq: ${model.lowfreq} highfreq: ${model.highfreq} max_duration: null min_duration: 0.1 ignore_file: null trim: False pitch_fmin: ${model.pitch_fmin} pitch_fmax: ${model.pitch_fmax} dataloader_params: drop_last: false shuffle: false batch_size: 24 num_workers: 8 pin_memory: true

preprocessor: target: nemo.collections.asr.parts.preprocessing.features.FilterbankFeatures nfilt: ${model.n_mel_channels} highfreq: ${model.highfreq} log: true log_zero_guard_type: clamp log_zero_guard_value: 1e-05 lowfreq: ${model.lowfreq} n_fft: ${model.n_fft} n_window_size: ${model.n_window_size} n_window_stride: ${model.n_window_stride} pad_to: 16 pad_value: ${model.pad_value} sample_rate: ${model.sample_rate} window: ${model.window} normalize: null preemph: null dither: 0.0 frame_splicing: 1 stft_conv: false nb_augmentation_prob : 0 mag_power: 1.0 exact_pad: true use_grads: false

encoder: target: nemo.collections.tts.modules.tacotron2.Encoder encoder_kernel_size: 5 encoder_n_convolutions: 3 encoder_embedding_dim: 512

decoder: target: nemo.collections.tts.modules.tacotron2.Decoder decoder_rnn_dim: 1024 encoder_embedding_dim: ${model.encoder.encoder_embedding_dim} gate_threshold: 0.5 max_decoder_steps: 1000 n_frames_per_step: 1 # currently only 1 is supported n_mel_channels: ${model.n_mel_channels} p_attention_dropout: 0.1 p_decoder_dropout: 0.1 prenet_dim: 256 prenet_p_dropout: 0.5 # Attention parameters attention_dim: 128 attention_rnn_dim: 1024 # AttentionLocation Layer parameters attention_location_kernel_size: 31 attention_location_n_filters: 32 early_stopping: true

postnet: target: nemo.collections.tts.modules.tacotron2.Postnet n_mel_channels: ${model.n_mel_channels} p_dropout: 0.5 postnet_embedding_dim: 512 postnet_kernel_size: 5 postnet_n_convolutions: 5

optim: name: adam lr: 1e-3 weight_decay: 1e-6

# scheduler setup
sched:
  name: CosineAnnealing
  min_lr: 1e-5

trainer: devices: 1 # number of gpus max_epochs: ??? num_nodes: 1 accelerator: gpu strategy: ddp accumulate_grad_batches: 1 enable_checkpointing: False # Provided by exp_manager logger: False # Provided by exp_manager gradient_clip_val: 1.0 log_every_n_steps: 60 check_val_every_n_epoch: 2 benchmark: false

exp_manager: exp_dir: null name: ${name} create_tensorboard_logger: true create_checkpoint_callback: true checkpoint_callback_params: monitor: val_loss mode: min

but I get this error:

Error executing job with overrides: ['model.sample_rate=16000', 'train_dataset=new_text.json', 'validation_datasets=new_text.json', 'trainer.max_epochs=100', 'trainer.accelerator=gpu', 'trainer.check_val_every_n_epoch=1', '+trainer.gpus=1'] Error in call to target 'nemo.collections.tts.torch.data.TTSDataset': KeyError('text') full_key: train_ds.dataset

how to I resolve this?

I replaced "original_text" with "text" but again I got this error

Lanzik avatar Sep 06 '22 07:09 Lanzik

Could you provide more of the error context? Does it give a line number where it fails?

The TTSDataset class does expect that each entry have a text field that contains the original text. Did you replace original_text for all entries?

I also notice that you seem to be using the EnglishPhonemesTokenizer and English language normalizer, which will probably get rid of a lot of the special symbols in your text. You may want to double-check that the behavior there is as intended.

@aroraakshit and @XuesongYang are more familiar with non-English training, have either of you seen this error before?

redoctopus avatar Sep 06 '22 17:09 redoctopus

Umm, is this NeMo 1.10? or NeMo 1.11? and could you please post the full error dump?

aroraakshit avatar Sep 07 '22 17:09 aroraakshit

haven't seen the error before. @Lanzik could you pls share full stack of error log?

XuesongYang avatar Sep 09 '22 01:09 XuesongYang

it has been 25+ days for this issue. Are you still facing this @Lanzik ?

cc: @redoctopus

aroraakshit avatar Oct 02 '22 12:10 aroraakshit

it has been 25+ days for this issue. Are you still facing this @Lanzik ?

cc: @redoctopus

no it's solved. in my text data there was ";" ! the problem was for semicolon and I replaced it with another character and solved!

Lanzik avatar Oct 02 '22 17:10 Lanzik