StyleTTS2
StyleTTS2 copied to clipboard
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360
Trying to do a training from scratch, experimenting with a small dataset to understand the training flow.
- First stage training finishes successful.
- 2nd stage training dies with a
ZeroDivisionError
exception
Traceback (most recent call last):
File "/workspace/tts/train_second.py", line 789, in <module>
main()
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/workspace/tts/train_second.py", line 676, in main
logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1
full output
(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.9, 0.99)
capturable: False
differentiable: False
eps: 1e-09
foreach: None
fused: None
initial_lr: 1e-05
lr: 1e-05
max_lr: 2e-05
max_momentum: 0.95
maximize: False
min_lr: 0
weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.0, 0.99)
capturable: False
differentiable: False
eps: 1e-09
foreach: None
fused: None
initial_lr: 1e-05
lr: 1e-05
max_lr: 2e-05
max_momentum: 0.95
maximize: False
min_lr: 0
weight_decay: 0.0001
)
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
Epochs: 1
Traceback (most recent call last):
File "/workspace/tts/train_second.py", line 789, in <module>
main()
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/workspace/tts/train_second.py", line 676, in main
logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1
Division through 0 happens here: https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L676
Because in the torch loop there is an exception, so https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L671 get's never reached and https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L569 stays 0.
The exception is invisible because of the try/except block: https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L672-L673
I added:
except Exception as e:
import traceback
traceback.print_exc()
continue
Which shows up the underlaying exception:
exception
Traceback (most recent call last):
File "/workspace/tts/train_second.py", line 612, in main
d, p = model.predictor(d_en, s,
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
output.reraise()
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
output = module(*input, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/models.py", line 469, in forward
d = self.text_encoder(texts, style, text_lengths, m)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/models.py", line 558, in forward
x, _ = block(x)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360
full output
(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.9, 0.99)
capturable: False
differentiable: False
eps: 1e-09
foreach: None
fused: None
initial_lr: 1e-05
lr: 1e-05
max_lr: 2e-05
max_momentum: 0.95
maximize: False
min_lr: 0
weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.0, 0.99)
capturable: False
differentiable: False
eps: 1e-09
foreach: None
fused: None
initial_lr: 1e-05
lr: 1e-05
max_lr: 2e-05
max_momentum: 0.95
maximize: False
min_lr: 0
weight_decay: 0.0001
)
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
Traceback (most recent call last):
File "/workspace/tts/train_second.py", line 612, in main
d, p = model.predictor(d_en, s,
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
output.reraise()
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
output = module(*input, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/models.py", line 469, in forward
d = self.text_encoder(texts, style, text_lengths, m)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/models.py", line 558, in forward
x, _ = block(x)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360
Epochs: 1
Traceback (most recent call last):
File "/workspace/tts/train_second.py", line 791, in <module>
main()
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/workspace/tts/train_second.py", line 678, in main
logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1
Note, just experimenting with minimal setup to get familar with training, therefore low # of epochs, max_len, etc
Configs/config.yml
log_dir: "/workspace/small_1208"
first_stage_path: "first_stage.pth"
save_freq: 2
log_interval: 10
device: "cuda"
epochs_1st: 2 # number of epochs for first stage training (pre-training)
epochs_2nd: 2 # number of peochs for second stage training (joint training)
batch_size: 40
max_len: 100 # maximum number of frames
pretrained_model: ""
second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage
load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters
F0_path: "Utils/JDC/bst.t7"
ASR_config: "Utils/ASR/config.yml"
ASR_path: "Utils/ASR/epoch_00080.pth"
PLBERT_dir: 'Utils/PLBERT/'
data_params:
train_data: "/workspace/ds/train_list.txt"
val_data: "/workspace/ds/val_list.txt"
root_path: "/workspace/ds/wavs"
OOD_data: "Data/OOD_texts.txt"
min_length: 50 # sample until texts with this size are obtained for OOD texts
preprocess_params:
sr: 24000
spect_params:
n_fft: 2048
win_length: 1200
hop_length: 300
model_params:
multispeaker: true
dim_in: 64
hidden_dim: 512
max_conv_dim: 512
n_layer: 3
n_mels: 80
n_token: 178 # number of phoneme tokens
max_dur: 50 # maximum duration of a single phoneme
style_dim: 128 # style vector size
dropout: 0.2
# config for decoder
decoder:
type: 'istftnet' # either hifigan or istftnet
resblock_kernel_sizes: [3,7,11]
upsample_rates : [10, 6]
upsample_initial_channel: 512
resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
upsample_kernel_sizes: [20, 12]
gen_istft_n_fft: 20
gen_istft_hop_size: 5
# speech language model config
slm:
model: 'microsoft/wavlm-base-plus'
sr: 16000 # sampling rate of SLM
hidden: 768 # hidden size of SLM
nlayers: 13 # number of layers of SLM
initial_channel: 64 # initial channels of SLM discriminator head
# style diffusion model config
diffusion:
embedding_mask_proba: 0.1
# transformer config
transformer:
num_layers: 3
num_heads: 8
head_features: 64
multiplier: 2
# diffusion distribution config
dist:
sigma_data: 0.2 # placeholder for estimate_sigma_data set to false
estimate_sigma_data: true # estimate sigma_data from the current batch if set to true
mean: -3.0
std: 1.0
loss_params:
lambda_mel: 5. # mel reconstruction loss
lambda_gen: 1. # generator loss
lambda_slm: 1. # slm feature matching loss
lambda_mono: 1. # monotonic alignment loss (1st stage, TMA)
lambda_s2s: 1. # sequence-to-sequence loss (1st stage, TMA)
TMA_epoch: 50 # TMA starting epoch (1st stage)
lambda_F0: 1. # F0 reconstruction loss (2nd stage)
lambda_norm: 1. # norm reconstruction loss (2nd stage)
lambda_dur: 1. # duration loss (2nd stage)
lambda_ce: 20. # duration predictor probability output CE loss (2nd stage)
lambda_sty: 1. # style reconstruction loss (2nd stage)
lambda_diff: 1. # score matching loss (2nd stage)
diff_epoch: 20 # style diffusion starting epoch (2nd stage)
joint_epoch: 50 # joint training starting epoch (2nd stage)
optimizer_params:
lr: 0.0001 # general learning rate
bert_lr: 0.00001 # learning rate for PLBERT
ft_lr: 0.00001 # learning rate for acoustic modules
slmadv_params:
min_len: 400 # minimum length of samples
max_len: 500 # maximum length of samples
batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size
iter: 10 # update the discriminator every this iterations of generator update
thresh: 5 # gradient norm above which the gradient is scaled
scale: 0.01 # gradient scaling factor for predictors from SLM discriminators
sig: 1.5 # sigma for differentiable duration modeling