hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

Inconsistency between the model parameters in the paper and the implementation on the github

Open Alexey322 opened this issue 3 years ago • 2 comments

@jik876 Hi.

I would like to know why you are not using the same parameters(for V1 configuration) as indicated in the paper? Your code has set the following parameters: "resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]]

But the paper says what you used: "resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1, 1], [3, 1], [5, 1]]

Also, it's not entirely clear to me why the MSD uses three discriminators with a 1x 4x 4x AvgPool1d (in the code, two discriminators are the same, which completely confused me) instead of 1x 2x 4x as the paper says.

I would be very glad to hear your answer.

Alexey322 avatar Jun 06 '21 22:06 Alexey322

Thanks for your interest. We excluded the fixed part (dilation=1) from the hyperparameter settings. Referring to the figures in our paper, you can see that the convolution blocks are repeated. Please check our code again for the avgpool part.

jik876 avatar Jun 08 '21 06:06 jik876

@jik876 Indeed, I looked at the code inattentively, thanks for your answer!

I'll ask you one more question right away, I am experimenting with small datasets ~ 1 hour of audio. When training a hifigan with a fixed learning rate = 1e-4, I achieve ~ 0.2 losses on the validation set, after which the model starts overfitting.

At the same time, when learning rate = 5е-5, the model begins to overfit at a loss = 0.17-0.18. The experiments were carried out on the same dataset

Do you have any idea how this is possible?

Alexey322 avatar Jun 09 '21 12:06 Alexey322