StyleTTS2 icon indicating copy to clipboard operation
StyleTTS2 copied to clipboard

RuntimeError: shape '[540672, 1]' is invalid for input of size 655360

Open devidw opened this issue 6 months ago • 7 comments

Trying to do a training from scratch, experimenting with a small dataset to understand the training flow.

  • First stage training finishes successful.
  • 2nd stage training dies with a ZeroDivisionError

exception

Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 789, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 676, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

full output

(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.0, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.0001
)
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
Epochs: 1
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 789, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 676, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

Division through 0 happens here: https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L676

Because in the torch loop there is an exception, so https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L671 get's never reached and https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L569 stays 0.

The exception is invisible because of the try/except block: https://github.com/yl4579/StyleTTS2/blob/1ece0a38a569c675bc79d9bd42b02bbe6cab8615/train_second.py#L672-L673

I added:

except Exception as e:
	import traceback
    traceback.print_exc()
    continue

Which shows up the underlaying exception:

exception

Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 612, in main
    d, p = model.predictor(d_en, s,
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 469, in forward
    d = self.text_encoder(texts, style, text_lengths, m)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 558, in forward
    x, _ = block(x)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
    result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360

full output

(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.0, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.0001
)
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 612, in main
    d, p = model.predictor(d_en, s,
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 469, in forward
    d = self.text_encoder(texts, style, text_lengths, m)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 558, in forward
    x, _ = block(x)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
    result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360

Epochs: 1
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 791, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 678, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

Note, just experimenting with minimal setup to get familar with training, therefore low # of epochs, max_len, etc

Configs/config.yml

log_dir: "/workspace/small_1208"
first_stage_path: "first_stage.pth"
save_freq: 2
log_interval: 10
device: "cuda"
epochs_1st: 2 # number of epochs for first stage training (pre-training)
epochs_2nd: 2 # number of peochs for second stage training (joint training)
batch_size: 40
max_len: 100 # maximum number of frames
pretrained_model: ""
second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage
load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters

F0_path: "Utils/JDC/bst.t7"
ASR_config: "Utils/ASR/config.yml"
ASR_path: "Utils/ASR/epoch_00080.pth"
PLBERT_dir: 'Utils/PLBERT/'

data_params:
  train_data: "/workspace/ds/train_list.txt"
  val_data: "/workspace/ds/val_list.txt"
  root_path: "/workspace/ds/wavs"
  OOD_data: "Data/OOD_texts.txt"
  min_length: 50 # sample until texts with this size are obtained for OOD texts

preprocess_params:
  sr: 24000
  spect_params:
    n_fft: 2048
    win_length: 1200
    hop_length: 300

model_params:
  multispeaker: true

  dim_in: 64 
  hidden_dim: 512
  max_conv_dim: 512
  n_layer: 3
  n_mels: 80

  n_token: 178 # number of phoneme tokens
  max_dur: 50 # maximum duration of a single phoneme
  style_dim: 128 # style vector size
  
  dropout: 0.2

  # config for decoder
  decoder: 
      type: 'istftnet' # either hifigan or istftnet
      resblock_kernel_sizes: [3,7,11]
      upsample_rates :  [10, 6]
      upsample_initial_channel: 512
      resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
      upsample_kernel_sizes: [20, 12]
      gen_istft_n_fft: 20
      gen_istft_hop_size: 5
      
  # speech language model config
  slm:
      model: 'microsoft/wavlm-base-plus'
      sr: 16000 # sampling rate of SLM
      hidden: 768 # hidden size of SLM
      nlayers: 13 # number of layers of SLM
      initial_channel: 64 # initial channels of SLM discriminator head
  
  # style diffusion model config
  diffusion:
    embedding_mask_proba: 0.1
    # transformer config
    transformer:
      num_layers: 3
      num_heads: 8
      head_features: 64
      multiplier: 2

    # diffusion distribution config
    dist:
      sigma_data: 0.2 # placeholder for estimate_sigma_data set to false
      estimate_sigma_data: true # estimate sigma_data from the current batch if set to true
      mean: -3.0
      std: 1.0
  
loss_params:
    lambda_mel: 5. # mel reconstruction loss
    lambda_gen: 1. # generator loss
    lambda_slm: 1. # slm feature matching loss
    
    lambda_mono: 1. # monotonic alignment loss (1st stage, TMA)
    lambda_s2s: 1. # sequence-to-sequence loss (1st stage, TMA)
    TMA_epoch: 50 # TMA starting epoch (1st stage)

    lambda_F0: 1. # F0 reconstruction loss (2nd stage)
    lambda_norm: 1. # norm reconstruction loss (2nd stage)
    lambda_dur: 1. # duration loss (2nd stage)
    lambda_ce: 20. # duration predictor probability output CE loss (2nd stage)
    lambda_sty: 1. # style reconstruction loss (2nd stage)
    lambda_diff: 1. # score matching loss (2nd stage)
    
    diff_epoch: 20 # style diffusion starting epoch (2nd stage)
    joint_epoch: 50 # joint training starting epoch (2nd stage)

optimizer_params:
  lr: 0.0001 # general learning rate
  bert_lr: 0.00001 # learning rate for PLBERT
  ft_lr: 0.00001 # learning rate for acoustic modules
  
slmadv_params:
  min_len: 400 # minimum length of samples
  max_len: 500 # maximum length of samples
  batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size
  iter: 10 # update the discriminator every this iterations of generator update
  thresh: 5 # gradient norm above which the gradient is scaled
  scale: 0.01 # gradient scaling factor for predictors from SLM discriminators
  sig: 1.5 # sigma for differentiable duration modeling

devidw avatar Dec 08 '23 14:12 devidw