TTS
TTS copied to clipboard
[Bug] Error in training Capacitron
Describe the bug
I was training a Capacitron
model with my own dataset (bn-BD
, 12-hour). Training started successfully, but it stopped after 27 epochs (around 5500 steps) with the following error message:
...
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)
I believe it's a PyTorch
issue. Can someone guide me solving this problem?
To Reproduce
I was doing this experiment in colab
. Here's the notebook: link
Here's the config.json file.
Expected behavior
No response
Logs
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1534, in fit
self._fit()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1518, in _fit
self.train_epoch()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1283, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1124, in train_step
num_optimizers=1,
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 998, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion)
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 954, in _model_train_step
return model.train_step(*input_args)
File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 327, in train_step
outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 198, in forward
speaker_embedding=embedded_speakers if self.capacitron_vae.capacitron_use_speaker_embedding else None,
File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/base_tacotron.py", line 257, in compute_capacitron_VAE_embedding
speaker_embedding, # pylint: disable=not-callable
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/TTS/tts/layers/tacotron/capacitron_layers.py", line 66, in forward
self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
File "/usr/local/lib/python3.7/dist-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributions/distribution.py", line 56, in __init__
f"Expected parameter {param} "
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)
Environment
{
"CUDA": {
"GPU": [
"Tesla T4"
],
"available": true,
"version": "11.3"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.12.0+cu113",
"TTS": "0.7.1",
"numpy": "1.21.6"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
""
],
"processor": "x86_64",
"python": "3.7.13",
"version": "#1 SMP Sun Apr 24 10:03:06 PDT 2022"
}
}
Additional context
No response
@WeberJulian
@arif334 I've had this issue before and what I would suggest is increasing min_audio_len
. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!
Posting the answer I made on the coqui-ai/TTS channel last friday.
I don't think that's the issue, I also got it in regular training I guess it's just part of Capacitron instabilities try just continuing the run with slightly different dataset parameters
@arif334 I've had this issue before and what I would suggest is increasing
min_audio_len
. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!
Okay, thanks. I'll report the update.
Update: The error returned after 125 epochs. My samples are between 1 and 10 seconds. And the model didn't seem to be learning well, all the loss curves are upward! @WeberJulian @a-froghyar
are you using phonemes? use_phonemes
is False in your config
I'd also try to reduce max_audio_len
to 6 seconds
are you using phonemes?
use_phonemes
is False in your config
Unfortunately, no. My language is not supported in gruut
, and espeak
performs poorly.
I'd also try to reduce max_audio_len to 6 seconds
Should I reduce my max_audio_len
as well? That would also reduce the duration of the dataset (probably <10 hr).
Then the task might be too hard for Capacitron without a phonemizer
Then the task might be too hard for Capacitron without a phonemizer
That was my assumption as well. So I'm going to postpone my Capacitron training for now. I'm working on developing my phonemizer. Hopefully, I'll come back when the phonemizer is ready.
facing same issue
Traceback (most recent call last):
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1533, in fit
self._fit()
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1517, in _fit
self.train_epoch()
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1282, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1114, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 998, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion)
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 954, in _model_train_step
return model.train_step(*input_args)
File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 352, in train_step
outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 216, in forward
encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding(
File "/home/manmay/TTS/TTS/tts/models/base_tacotron.py", line 254, in compute_capacitron_VAE_embedding
(VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer(
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py", line 67, in forward
self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)