TTS [Bug] Error in training Capacitron

Describe the bug

I was training a Capacitron model with my own dataset (bn-BD, 12-hour). Training started successfully, but it stopped after 27 epochs (around 5500 steps) with the following error message:

...
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)

I believe it's a PyTorch issue. Can someone guide me solving this problem?

To Reproduce

I was doing this experiment in colab. Here's the notebook: link

Here's the config.json file.

Expected behavior

No response

Logs

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1534, in fit
    self._fit()
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1518, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1283, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1124, in train_step
    num_optimizers=1,
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 998, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 954, in _model_train_step
    return model.train_step(*input_args)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 327, in train_step
    outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 198, in forward
    speaker_embedding=embedded_speakers if self.capacitron_vae.capacitron_use_speaker_embedding else None,
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/base_tacotron.py", line 257, in compute_capacitron_VAE_embedding
    speaker_embedding,  # pylint: disable=not-callable
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/layers/tacotron/capacitron_layers.py", line 66, in forward
    self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
  File "/usr/local/lib/python3.7/dist-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributions/distribution.py", line 56, in __init__
    f"Expected parameter {param} "
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla T4"
        ],
        "available": true,
        "version": "11.3"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.0+cu113",
        "TTS": "0.7.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.7.13",
        "version": "#1 SMP Sun Apr 24 10:03:06 PDT 2022"
    }
}

Additional context

No response

Aug 05 '22 14:08 arif334

@WeberJulian

Aug 07 '22 11:08 erogol

@arif334 I've had this issue before and what I would suggest is increasing min_audio_len. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!

Aug 09 '22 08:08 a-froghyar

Posting the answer I made on the coqui-ai/TTS channel last friday.

I don't think that's the issue, I also got it in regular training I guess it's just part of Capacitron instabilities try just continuing the run with slightly different dataset parameters

Aug 09 '22 08:08 WeberJulian

@arif334 I've had this issue before and what I would suggest is increasing min_audio_len. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!

Okay, thanks. I'll report the update.

Aug 09 '22 10:08 arif334

Update: The error returned after 125 epochs. My samples are between 1 and 10 seconds. And the model didn't seem to be learning well, all the loss curves are upward! @WeberJulian @a-froghyar

Aug 14 '22 06:08 arif334

are you using phonemes? use_phonemes is False in your config

Aug 15 '22 08:08 a-froghyar

I'd also try to reduce max_audio_len to 6 seconds

Aug 15 '22 08:08 a-froghyar

are you using phonemes? use_phonemes is False in your config

Unfortunately, no. My language is not supported in gruut, and espeak performs poorly.

I'd also try to reduce max_audio_len to 6 seconds

Should I reduce my max_audio_len as well? That would also reduce the duration of the dataset (probably <10 hr).

Aug 15 '22 13:08 arif334

Then the task might be too hard for Capacitron without a phonemizer

Aug 18 '22 09:08 WeberJulian

Then the task might be too hard for Capacitron without a phonemizer

That was my assumption as well. So I'm going to postpone my Capacitron training for now. I'm working on developing my phonemizer. Hopefully, I'll come back when the phonemizer is ready.

Aug 20 '22 03:08 arif334

facing same issue

Aug 27 '22 13:08 manmay-nakhashi

config.txt

Aug 27 '22 13:08 manmay-nakhashi

Traceback (most recent call last):
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1533, in fit
    self._fit()
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1517, in _fit
    self.train_epoch()
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1282, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1114, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 998, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 954, in _model_train_step
    return model.train_step(*input_args)
  File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 352, in train_step
    outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
  File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 216, in forward
    encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding(
  File "/home/manmay/TTS/TTS/tts/models/base_tacotron.py", line 254, in compute_capacitron_VAE_embedding
    (VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer(
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py", line 67, in forward
    self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)

Aug 27 '22 13:08 manmay-nakhashi

TTS TTS copied to clipboard

[Bug] Error in training Capacitron

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

TTS
TTS copied to clipboard