hifi-gan
hifi-gan copied to clipboard
Inconsistency between the model parameters in the paper and the implementation on the github
@jik876 Hi.
I would like to know why you are not using the same parameters(for V1 configuration) as indicated in the paper?
Your code has set the following parameters:
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]]
But the paper says what you used:
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1, 1], [3, 1], [5, 1]]
Also, it's not entirely clear to me why the MSD uses three discriminators with a 1x 4x 4x AvgPool1d (in the code, two discriminators are the same, which completely confused me) instead of 1x 2x 4x as the paper says.
I would be very glad to hear your answer.
Thanks for your interest. We excluded the fixed part (dilation=1) from the hyperparameter settings. Referring to the figures in our paper, you can see that the convolution blocks are repeated. Please check our code again for the avgpool part.
@jik876 Indeed, I looked at the code inattentively, thanks for your answer!
I'll ask you one more question right away, I am experimenting with small datasets ~ 1 hour of audio. When training a hifigan with a fixed learning rate = 1e-4, I achieve ~ 0.2 losses on the validation set, after which the model starts overfitting.
At the same time, when learning rate = 5е-5, the model begins to overfit at a loss = 0.17-0.18. The experiments were carried out on the same dataset
Do you have any idea how this is possible?