speedyspeech RuntimeError: stack expects a non-empty TensorList

RuntimeError: stack expects a non-empty TensorList

Open Charlottecuc opened this issue 3 years ago • 10 comments

Hi. Thank you very much for your implementation. I tried to extract the duration by using the default configs (the only difference is that a different dataset is used). However, after 9 iterations, the following error occurred:

  File "code/duration_extractor.py", line 539, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 465, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Could you help me to sovle this problem? Thank you ~

Nov 19 '20 06:11 Charlottecuc

Hi thanks for your interest in this repo. Could you try if you are able to extract the durations for the default LJSpeech dataset? Could you please try to print how the inputs to the nnls function look like? (just add print in your repo local copy). Also what checkpoint did you use for the duration extractor? Did you train your own, or did you use the default provided with this project?

Nov 19 '20 09:11 janvainer2

I had the same error after I run this command python code/duration_extractor.py

Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                                       rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                                        for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Nov 19 '20 13:11 adnan-mehremic

Are you training on GPU or CPU? I will need more information to reproduce the error..

Nov 19 '20 21:11 janvainer2

Ok, I tried a few times, and always got the same error. I followed all your steps, and after running this command python code/duration_extractor.py, I got this error (as you can see model sent to cuda)

ubuntu@ip-172-31-68-24:~/speedyspeech$ python code/duration_extractor.py
Model sent to cuda
13000/13000: [===============================>] - ETA 1.6sss
Epoch 1 | Train - l1: 0.09392118094296291, guided_att: 0.00031112270836037095| V                  alid - l1: 0.3166225552558899, guided_att: 0.0004626042937161401|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 2 | Train - l1: 0.06905996212231115, guided_att: 0.0002700031827905862| Va                  lid - l1: 0.3054344058036804, guided_att: 0.00043494933925103396|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 3 | Train - l1: 0.06594225224749796, guided_att: 0.00026452819020498836| V                  alid - l1: 0.32097506523132324, guided_att: 0.00046123971696943045|
13000/13000: [===============================>] - ETA 1.1sss
Epoch 4 | Train - l1: 0.06372856097341759, guided_att: 0.0002559272787021014| Va                  lid - l1: 0.32438914477825165, guided_att: 0.00048450268513988703|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 5 | Train - l1: 0.06199859332274921, guided_att: 0.0002551149550952669| Va                  lid - l1: 0.3171471357345581, guided_att: 0.0004896632890449837|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 6 | Train - l1: 0.06050542716322274, guided_att: 0.0002568568125380928| Va                  lid - l1: 0.2853122800588608, guided_att: 0.00046930725511629134|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 7 | Train - l1: 0.05929661129275566, guided_att: 0.0002494556063744859| Va                  lid - l1: 0.25290364027023315, guided_att: 0.0005208489892538637|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 8 | Train - l1: 0.05856953240160284, guided_att: 0.00024662175923448256| V                  alid - l1: 0.39512471854686737, guided_att: 0.0008473480411339551|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 9 | Train - l1: 0.05783513459959641, guided_att: 0.00024235204981612455| V                  alid - l1: 0.32342180609703064, guided_att: 0.0010448592656757683|
13000/13000: [===============================>] - ETA 1.0sss
Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                  rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                   for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Nov 26 '20 12:11 adnan-mehremic

@adnan-mehremic Thanks for the info, I will try to replicate this during the weekend

Nov 26 '20 13:11 janvainer

@janvainer : as seen from https://github.com/pytorch/pytorch/issues/38605, moved to torch==1.5.1 and the issue is not seen. anyhow, have to read up to understand what is going on.

Feb 27 '21 13:02 dsplog

Thanks for the link. My problem with this issue is that I am not able to reproduce this even with a clean setup and reinstalled dependencies and everything works even with torch==1.5.0. What might be a problem is that the requirements installation failed last time I tried and I had to install numpy and some other numeric packages separately. Could you please check that your installed dependencies are exactly the same like in requirements? Or just post it here and I will check. There is possibly some dependency version conflict that may arise when the packages are installed at once.

Feb 27 '21 18:02 janvainer

Thank you for awesome project! I had the same problem training the model for another language and moving to torch==1.5.1 fixed the problem for me. All the packages were matching the ones in the requirements.

Here is some info on the tensors from the nnls function:

mel_basis:  torch.Tensor of size [80, 513]
 mel_spec:  torch.Tensor of size [1, 80, 1128]
        X:  torch.Tensor of size [1, 513, 1128]

In both torch versions the tensors are the same. However, with 1.5.0 torch.nn.utils.clip_grad_norm_ seems to fail with the error mentioned above.

Apr 03 '21 20:04 pmunaretto

Thanks for trying this out! I will check if version 1.5.1 works for me and will bump up the requirement.

Apr 04 '21 14:04 janvainer

Hi all.

Just to report. I had the same problem. I updated to tourch==1.5.1. Indeed, it solved the problem. Although from another project I saw another solution: https://github.com/audio-captioning/dcase-2020-baseline/issues/7. The solution was to run the gradient backward before the gradient clip. I notice that you have done the same: first clip, then backward. Perhaps, changing these call orders could solve this problem for good?

Oct 24 '21 19:10 DanielJean007

speedyspeech speedyspeech copied to clipboard

RuntimeError: stack expects a non-empty TensorList

speedyspeech
speedyspeech copied to clipboard