Am I correct in saying the training code in customtokenizer only trains one X Y pair at a time instead of a whole batch at once? Are there any plans...
(Took me way too long to realize this, and it just goes to show that most of us are just point and click type of fellas who don't really understand...
All trained hifigan models come out sounding like this. It just generates straight mel spectrogram bands.