TTS
TTS copied to clipboard
[Bug] ValueError when running inference with trained OverFlow model
Describe the bug
Hi,
I could train OverFlow model from scratch on my own dataset (22050 Hz samples). But when I try to check its output via tts --text "Bonjour les amis" --model_path /home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/checkpoint_1500.pth --config_path /home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/config.json --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path test_own_overflow.wav
I get a ValueError
:
> vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
Traceback (most recent call last):
File "/home/CoquiTTS/coquienv/bin/tts", line 8, in <module>
sys.exit(main())
File "/home/CoquiTTS/TTS/TTS/bin/synthesize.py", line 316, in main
synthesizer = Synthesizer(
File "/home/CoquiTTS/TTS/TTS/utils/synthesizer.py", line 75, in __init__
self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
File "/home/CoquiTTS/TTS/TTS/utils/synthesizer.py", line 108, in _load_tts
self.tts_config = load_config(tts_config_path)
File "/home/CoquiTTS/TTS/TTS/config/__init__.py", line 96, in load_config
config.from_dict(config_dict)
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 694, in from_dict
self = self.deserialize(data) # pylint: disable=self-cls-assignment
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 412, in deserialize
value = _deserialize(value, field.type)
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 284, in _deserialize
return _deserialize_list(x, field_type)
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 221, in _deserialize_list
return [_deserialize(xi, field_arg) for xi in x]
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 221, in <listcomp>
return [_deserialize(xi, field_arg) for xi in x]
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 288, in _deserialize
return field_type.deserialize_immutable(x)
File "/home/CoquiTTS/coquienv/lib/python3.10/site-packages/coqpit/coqpit.py", line 426, in deserialize_immutable
raise ValueError()
ValueError
I previously tested it with a checkpoint at 150k steps trained with 16kHz samples and had the same ValueError
during inference.
Here is the config.json :
{
"output_path": "/home/caraduf/Models/Overflow",
"logger_uri": null,
"run_name": "Test_Overflow_22kHz",
"project_name": null,
"run_description": "\ud83d\udc38Coqui trainer run.",
"print_step": 1,
"plot_step": 1,
"model_param_stats": false,
"wandb_entity": null,
"dashboard_logger": "tensorboard",
"log_model_step": null,
"save_step": 500,
"save_n_checkpoints": 5,
"save_checkpoints": true,
"save_all_best": false,
"save_best_after": 10000,
"target_loss": null,
"print_eval": true,
"test_delay_epochs": -1,
"run_eval": true,
"run_eval_steps": 100,
"distributed_backend": "nccl",
"distributed_url": "tcp://localhost:54321",
"mixed_precision": true,
"epochs": 20001,
"batch_size": 32,
"eval_batch_size": 16,
"grad_clip": 40000.0,
"scheduler_after_epoch": true,
"lr": 0.001,
"optimizer": "Adam",
"optimizer_params": {
"weight_decay": 1e-06
},
"lr_scheduler": null,
"lr_scheduler_params": {},
"use_grad_scaler": false,
"cudnn_enable": true,
"cudnn_deterministic": false,
"cudnn_benchmark": false,
"training_seed": 54321,
"model": "Overflow",
"num_loader_workers": 4,
"num_eval_loader_workers": 2,
"use_noise_augment": false,
"audio": {
"fft_size": 1024,
"win_length": 1024,
"hop_length": 256,
"frame_shift_ms": null,
"frame_length_ms": null,
"stft_pad_mode": "reflect",
"sample_rate": 22050,
"resample": false,
"preemphasis": 0.0,
"ref_level_db": 20,
"do_sound_norm": false,
"log_func": "np.log",
"do_trim_silence": true,
"trim_db": 60.0,
"do_rms_norm": false,
"db_level": null,
"power": 1.5,
"griffin_lim_iters": 60,
"num_mels": 80,
"mel_fmin": 0.0,
"mel_fmax": 8000,
"spec_gain": 1.0,
"do_amp_to_db_linear": true,
"do_amp_to_db_mel": true,
"pitch_fmax": 640.0,
"pitch_fmin": 1.0,
"signal_norm": false,
"min_level_db": -100,
"symmetric_norm": true,
"max_norm": 4.0,
"clip_norm": true,
"stats_path": null
},
"use_phonemes": true,
"phonemizer": "espeak",
"phoneme_language": "fr-fr",
"compute_input_seq_cache": false,
"text_cleaner": "multilingual_cleaners",
"enable_eos_bos_chars": false,
"test_sentences_file": "",
"phoneme_cache_path": "/home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/phoneme_cache",
"characters": {
"characters_class": "TTS.tts.utils.text.characters.IPAPhonemes",
"vocab_dict": null,
"pad": "<PAD>",
"eos": "<EOS>",
"bos": "<BOS>",
"blank": "<BLNK>",
"characters": "iy\u0268\u0289\u026fu\u026a\u028f\u028ae\u00f8\u0258\u0259\u0275\u0264o\u025b\u0153\u025c\u025e\u028c\u0254\u00e6\u0250a\u0276\u0251\u0252\u1d7b\u0298\u0253\u01c0\u0257\u01c3\u0284\u01c2\u0260\u01c1\u029bpbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029f\u02c8\u02cc\u02d0\u02d1\u028dw\u0265\u029c\u02a2\u02a1\u0255\u0291\u027a\u0267\u02b2\u025a\u02de\u026b",
"punctuations": "!'(),-.:;? ",
"phonemes": null,
"is_unique": false,
"is_sorted": true
},
"add_blank": false,
"batch_group_size": 0,
"loss_masking": null,
"min_audio_len": 512,
"max_audio_len": 200000,
"min_text_len": 10,
"max_text_len": 500,
"compute_f0": false,
"compute_linear_spec": false,
"precompute_num_workers": 4,
"start_by_longest": true,
"shuffle": false,
"drop_last": false,
"datasets": [
[
{
"formatter": "ljspeech",
"dataset_name": "Own_1",
"path": "/home/caraduf/Datasets/22kHz/Own_1_22.05kHz_dataset",
"meta_file_train": "metadata.csv",
"ignored_speakers": null,
"language": "fr-fr",
"meta_file_val": "",
"meta_file_attn_mask": ""
},
{
"formatter": "ljspeech",
"dataset_name": "Own_2",
"path": "/home/caraduf/Datasets/22kHz/Own_2_22.05kHz_dataset",
"meta_file_train": "metadata.csv",
"ignored_speakers": null,
"language": "fr-fr",
"meta_file_val": "",
"meta_file_attn_mask": ""
},
{
"formatter": "ljspeech",
"dataset_name": "Own_3",
"path": "/home/caraduf/Datasets/22kHz/Own_2_22.05kHz_dataset",
"meta_file_train": "metadata.csv",
"ignored_speakers": null,
"language": "fr-fr",
"meta_file_val": "",
"meta_file_attn_mask": ""
}
]
],
"test_sentences": [
"Il m'a fallu du temps pour obtenir cette voix, alors je ne vais pas me taire!",
"Salut c'est l'\u00e9t\u00e9, on va s'\u00e9clater",
"Mais son age rendait cette derni\u00e8re qualit\u00e9 plus saillante!"
],
"eval_split_max_size": null,
"eval_split_size": 0.01,
"use_speaker_weighted_sampler": false,
"speaker_weighted_sampler_alpha": 1.0,
"use_language_weighted_sampler": false,
"language_weighted_sampler_alpha": 1.0,
"use_length_weighted_sampler": false,
"length_weighted_sampler_alpha": 1.0,
"force_generate_statistics": false,
"mel_statistics_parameter_path": "/home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/stat_parameters.pt",
"num_chars": 131,
"state_per_phone": 2,
"encoder_in_out_features": 512,
"encoder_n_convolutions": 3,
"out_channels": 80,
"ar_order": 1,
"sampling_temp": 0.334,
"deterministic_transition": true,
"duration_threshold": 0.55,
"use_grad_checkpointing": true,
"max_sampling_time": 1000,
"prenet_type": "original",
"prenet_dim": 256,
"prenet_n_layers": 2,
"prenet_dropout": 0.5,
"prenet_dropout_at_inference": false,
"memory_rnn_dim": 1024,
"outputnet_size": [
1024
],
"flat_start_params": {
"mean": 0.0,
"std": 1.0,
"transition_p": 0.14
},
"std_floor": 0.01,
"hidden_channels_dec": 150,
"kernel_size_dec": 5,
"dilation_rate": 1,
"num_flow_blocks_dec": 12,
"num_block_layers": 4,
"dropout_p_dec": 0.05,
"num_splits": 4,
"num_squeeze": 2,
"sigmoid_scale": false,
"c_in_channels": 0,
"r": 1,
"use_d_vector_file": false,
"use_speaker_embedding": false,
"github_branch": "inside_docker"
}
If I try the default command tts --text "Hello world!" --model_name tts_models/en/ljspeech/overflow --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path output.wav
I get the wav output as expected.
To Reproduce
Train OverFlow model with the provided recipe.
Wait for a checkpoint to be written.
Run an inference on that checkpoint with tts --text "Bonjour les amis" --model_path /home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/checkpoint_1500.pth --config_path /home/caraduf/Models/Overflow/Test_Overflow_22kHz-January-20-2023_06+19PM-0000000/config.json --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path test_own_overflow.wav
A ValueError
appears and no wav is written to disk.
Expected behavior
A wav file should be written to disk.
Logs
No response
Environment
{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 3090"
],
"available": true,
"version": "11.7"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.13.0+cu117",
"TTS": "0.10.0",
"numpy": "1.22.4"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.10.6",
"version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"
}
}
Additional context
No response
Printing data
object in coqpit deserialize_immutable
method shows that datasets
is a list instead of a dict. Actually when watching carefully at the generated config.json it shows "datasets": [ [ {...}, {...} ] ]
instead of "datasets": [ {...}, {...} ]
. Manually removing the useless [] solves the problem.
When training VITS these useless [] do not appear. So they are only generated while training OverFlow. So the culprit should be the function that generates the json from the train_overflow.py recipe.
@shivammehta25 could you check this one?
Sure! You can assign it to me, I will take a look at it as soon as I can.
Hi! When training with a single dataset, I couldn't replicate the error. Could you please share the training script/recipe that you used for this? I feel there are extra brackets in the datasets
than what is supposed to be in the config.json
and the datasets
parameter is populated in the training recipe.
Hi ! I used the recipe provided in the repo.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.