hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

[Question] Dataset preprocessing

Open Kreevoz opened this issue 4 years ago • 5 comments

I've attempted to preprocess my dataset to meet the mel-spectrogram requirements but I either wind up with incorrectly packed spectrogram files, a wrong header, or wrong data. Don't think any of the tacotron2 implementations I can get my hands on will output the data in the required format, or I'm overlooking something obvious (which is equally likely 😌).

Could one of you helpful people provide a link to a working piece of code that takes care of this properly or could this repository be fleshed out more so that there is a working preprocessor for training datasets? 🤔

Kreevoz avatar Dec 03 '20 11:12 Kreevoz

Preprocessing for generating spectrograms from audio is implemented in meldataset.py. Posting details with the error log will be helpful to find a solution.

jik876 avatar Dec 04 '20 01:12 jik876

Ah I should probably have been more specific. Also, thankyou for taking the time to respond, jik!

I was specifically curious about fine-tuning. The readme mentions that mel-spectrograms need to be generated with tacotron-2 with teacher-forcing - in the example provided there, they'd have been then placed in the ft_dataset folder.

That's where I'm not making progress. I took a look through the meldataset.py and it doesn't include any functions to interface with tacotron2 to facilitate the generation of the required mels. So how could that be done?

Kreevoz avatar Dec 04 '20 10:12 Kreevoz

Just arrived at the same question: the preprocessing in the nvidia taco2 repo is a bit different. Did you finetune on their pretrained LJ model or train a new one with your preprocessing?

m-toman avatar Dec 04 '20 10:12 m-toman

Ah I should probably have been more specific. Also, thankyou for taking the time to respond, jik!

I was specifically curious about fine-tuning. The readme mentions that mel-spectrograms need to be generated with tacotron-2 with teacher-forcing - in the example provided there, they'd have been then placed in the ft_dataset folder.

That's where I'm not making progress. I took a look through the meldataset.py and it doesn't include any functions to interface with tacotron2 to facilitate the generation of the required mels. So how could that be done?

@Vozeek

You can generate the spectrograms for fine tuning using the forward operation of Tacotron2. After saving spectrograms generated by Tacotron2 using numpy.save(), set the fine_tuning command line option and start training.

jik876 avatar Dec 10 '20 02:12 jik876

Yeah, I get the rough idea of what you're saying. I don't understand Tacotron2 well enough to do that. 😌 I'm a novice looking into this stuff as a hobby, didn't study any of it professionally.

If anyone else reading this knows of a fork that has the necessary preprocessing integrated, let me know. That would be of tremendous help. In the meatime I'll stick to using waveglow, even though that doesn't sound nearly as nice and smooth as hifi-gan.

Kreevoz avatar Dec 15 '20 19:12 Kreevoz