FastSpeech2 Mismatch tensor size when training

I encountered this error when trying to train a new model for a different language. Lexicon, TextGrid, .lab files is from MFA Can you please take a look at this issue @ming024

Traceback (most recent call last):                                                              | 0/212 [00:00<?, ?it/s]
  File "train.py", line 198, in <module>
    main(args, configs)
  File "train.py", line 82, in main
    output = model(*(batch[2:]))
  File "/fastspeech2/fs2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastspeech2/fs2-env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/fastspeech2/fs2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastspeech2/FastSpeech2/model/fastspeech2.py", line 91, in forward
    d_control,
  File "/fastspeech2/fs2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastspeech2/FastSpeech2/model/modules.py", line 121, in forward
    x = x + pitch_embedding
RuntimeError: The size of tensor a (47) must match the size of tensor b (101) at non-singleton dimension 1
Training:   0%|                                                                 | 1/900000 [00:00<122:50:30,  2.04it/s]
Epoch 1:   0%|                                                                                 | 0/212 [00:00<?, ?it/s]

p/s: I tried run the same train command without changing anything few time, each time the size of tensor a and tensor b is different

May 27 '21 11:05 EuphoriaCelestial

@EuphoriaCelestial Could you please print out the values of these tensors, or give more information?

May 27 '21 12:05 ming024

This looks like the same error as https://github.com/ming024/FastSpeech2/issues/66

May 28 '21 00:05 roedoejet

This looks like the same error as #66

oh yes, it was the same error I just have check my preprocess.yaml , everything is running at phoneme level

If not, maybe you should check that whether the length mismatch is caused by incorrect padding or not.

I am not sure how to check this

May 28 '21 01:05 EuphoriaCelestial

@EuphoriaCelestial Could you please print out the values of these tensors, or give more information?

yes I can provide any information needed, where can I find those tensors to print its value?

May 28 '21 01:05 EuphoriaCelestial

the best solution is to run "mfa train xxx" command to generate textgrid files again and then run preprocess.py(even run "mfa align xxx" command using align model trained in other dataset with same lexicon may also not work).

May 28 '21 02:05 yileld

the best solution is to run "mfa train xxx" command to generate textgrid files again and then run preprocess.py(even run "mfa align xxx" command using align model trained in other dataset with same lexicon may also not work).

it mean generate lexicon, textgrid, lab files all over again?

May 28 '21 02:05 EuphoriaCelestial

I just found out I have some file listed in unaligned.txt in TextGrid folder, should I delete them?

May 28 '21 02:05 EuphoriaCelestial

I just found out I have some file listed in unaligned.txt in TextGrid folder, should I delete them?

it doesnt matter, just keep it. lexicon and lab no need to change, just use "mfa train xxx" to generate textgrid again, and then generate pitch/mel/energy/duration file again

May 28 '21 03:05 yileld

it doesnt matter, just keep it.

okay

lexicon and lab no need to change, just use "mfa train xxx" to generate textgrid again, and then generate pitch/mel/energy/duration file again

if we dont change anything, what is the point of running that command again? the generated textgrid, pitch/mel/energy/dur will be the same as before right? or something? I used pretrain MFA models

May 28 '21 03:05 EuphoriaCelestial

it doesnt matter, just keep it.

okay

lexicon and lab no need to change, just use "mfa train xxx" to generate textgrid again, and then generate pitch/mel/energy/duration file again

if we dont change anything, what is the point of running that command again? the generated textgrid, pitch/mel/energy/dur will be the same as before right? or something? I used pretrain MFA models

I havent find out reason yet, but using mfa models trained from another lexicon or same lexicon but another dataset indeed cause this error

May 28 '21 03:05 yileld

it doesnt matter, just keep it.

okay

lexicon and lab no need to change, just use "mfa train xxx" to generate textgrid again, and then generate pitch/mel/energy/duration file again

if we dont change anything, what is the point of running that command again? the generated textgrid, pitch/mel/energy/dur will be the same as before right? or something? I used pretrain MFA models

I havent find out reason yet, but using mfa models trained from another lexicon or same lexicon but another dataset indeed cause this error

wait, let me re-call all steps I have done: I used mfa_g2p to generate my own lexicon from my dataset (pretrain g2p model on MFA website is being used) then I used that lexicon to generate TextGrid, using mfa_train command and then prepare_align.py to generate raw_data folder after that I run preprocess.py finally, train.py and got this error

is that correct steps? if so, what step should I do again now?

May 28 '21 03:05 EuphoriaCelestial

it doesnt matter, just keep it.

okay

lexicon and lab no need to change, just use "mfa train xxx" to generate textgrid again, and then generate pitch/mel/energy/duration file again

if we dont change anything, what is the point of running that command again? the generated textgrid, pitch/mel/energy/dur will be the same as before right? or something? I used pretrain MFA models

I havent find out reason yet, but using mfa models trained from another lexicon or same lexicon but another dataset indeed cause this error

wait, let me re-call all steps I have done: I used mfa_g2p to generate my own lexicon from my dataset (pretrain g2p model on MFA website is being used) then I used that lexicon to generate TextGrid, using mfa_train command and then prepare_align.py to generate raw_data folder after that I run preprocess.py finally, train.py and got this error

is that correct steps? if so, what step should I do again now?

oh I don't have g2p step since I just use author's mandarin lexicon. As for your situation, maybe another reason cause the error. You train another language, do you add symbols file in text/yourlanguage.py?

May 28 '21 05:05 yileld

You train another language, do you add symbols file in text/yourlanguage.py?

I tried changing symbols list; but there is some part I dont fully understand, like arpabet and pinyin So I just changed _letters list

May 28 '21 06:05 EuphoriaCelestial

Are you using your own symbols? Check issue #66 because I had the same problem of shape mismatch. The mfa part is good, the problem is that some processing is incorrectly applied to the text in the dataset.

May 28 '21 19:05 SamuelLarkin

Are you using your own symbols? Check issue #66 because I had the same problem of shape mismatch. The mfa part is good, the problem is that some processing is incorrectly applied to the text in the dataset.

yes I am using my own symbols, but so far I only changed the _letters list in /text/symbols.py, I dont know how should I change arpabet list I have commented on your issue, please answer those question

May 29 '21 02:05 EuphoriaCelestial

FastSpeech2 FastSpeech2 copied to clipboard

Mismatch tensor size when training

FastSpeech2
FastSpeech2 copied to clipboard