OpenUtau icon indicating copy to clipboard operation
OpenUtau copied to clipboard

G2P Trainer - Request to update or revisit code

Open ariikamusic opened this issue 1 month ago • 1 comments

Acknowledgement

  • [x] I have read Getting-Started and FAQ

Description of the new feature / enhancement

For the past few weeks, I have been trying to train a Dutch G2P model using the Trainer and trying endless hyperparameter settings, but I consistently run into the same issues, regardless of dictionary size, data quality or setting.

  • NaN loss and exploding gradients: Training frequently halts due to NaN loss after a few epochs. I have tried using grad_clip=1.0, but this does not prevent the issue entirely.

  • Poor OOV performance: Out-of-vocabulary (OOV) words are predicted as gibberish, no matter what dictionary I've used (10K dict, 40K dict and 430K dict).

Here are some examples of Dutch OOV words and their incorrect predictions using a model trained on 430K words:

  • Grapheme: hyperconcentratiemodus Phonemes: [zj z oo ts n d o j t ieU ts s]

  • Grapheme: vriesdroogstofje Phonemes: [uY ieU djh s t o eeU ts]

  • Grapheme: snurkdynamiek Phonemes: [s ts ts tsj ng oo O ieU tsj]

  • Grapheme: yuzu-extract Phonemes: [oY ts oo ts djh z oo R t]

  • Grapheme: quasi-logisch Phonemes: [R ts w s ieU oo n o]

  • Grapheme: kweekslangetje Phonemes: [eU f m sj n oo ng h u a djh]

  • Grapheme: retro-vibey Phonemes: [h aU u R o Y ieU aU ts j oY oY]

For this model, I modified the default parameters to:

  • num_layers: 4
  • dropout: 0.5
  • d_model: 256
  • d_hidden: 512

It appears the current G2P trainer may not be well-suited for languages with complex pronunciation and phonetic rules. The model fails to generalize beyond the training set, even with high-quality and large dictionaries.

Any advice would be greatly appreciated.

Proposed technical implementation details

I request that the G2P Trainer can be revisited and issues mentioned above are identified and resolved. Implementing different LR-Schedulers and Optimizers, and possibly revisiting the model architecture for a more robust and smarter G2p-model.

ariikamusic avatar Nov 16 '25 22:11 ariikamusic

The issue you are seeing is beyond the model's capability. It is very clear something is broken, instead of "not good enough". I suppose you are using the rnntloss on gpu? It just don't work. I used the cpu version for a reason.

stakira avatar Nov 25 '25 02:11 stakira

The issue you are seeing is beyond the model's capability. It is very clear something is broken, instead of "not good enough". I suppose you are using the rnntloss on gpu? It just don't work. I used the cpu version for a reason.

Correct. I was running the loss function on gpu. After switching back to cpu, the nan losses largely stopped appearing. That said, the evaluation loss stagnation persisted. I plan to continue training a final model with the best parameters available and will patiently wait for future improvements. Thank you!

ariikamusic avatar Dec 15 '25 15:12 ariikamusic