melgan icon indicating copy to clipboard operation
melgan copied to clipboard

Batch size = 16?

Open chapter544 opened this issue 5 years ago • 8 comments

Hi, Thank you for your nice implementation. I have a question about the batch size selection. It looks like the network is small enough for bigger batch size, for example 32 or 64 on a GTX 1080Ti. Batch size of 16 is a kind of regularization? Another question is related to the G/D updates. In your generated samples, are you using 1:1? Thanks.

chapter544 avatar Oct 26 '19 11:10 chapter544

Hi, @chapter544

  • The original authors noted that selecting batch size is important, so I didn’t increase the batch size even if I could use bigger batch size. Currently I’m testing whether using larger batch size is harmful or not.
  • I used 1:1.

seungwonpark avatar Oct 26 '19 11:10 seungwonpark

@seungwonpark thank you for your information. I'll leave this issue a little longer so that we can confirm about the batch size selection.

chapter544 avatar Oct 27 '19 02:10 chapter544

Training loss curve of internal multi-speaker dataset. The batch size is 16(orange), 64(blue). I can't determine whether 64 is okay for now...

image

seungwonpark avatar Oct 28 '19 05:10 seungwonpark

@seungwonpark Sorry but I couldn't find the note in the original paper that batch size was carefully chosen. Also, I've thinking that if we use multi-speaker training scheme and use larger batch, then it can include more modes in training batch so that it can helps training.(also discussed in https://arxiv.org/abs/1809.11096, but not sure because it may depends on domain)

wade3han avatar Oct 28 '19 05:10 wade3han

@wade3han Actually it wasn't noted on the paper, but there was a TeX comment like:

%Batch size was an important hyper-parameter that required tuning to find optimal audio fidelity and faster training time. We used batch size 16 for all experiments.

You can see the LaTeX source of the original paper at: https://arxiv.org/format/1910.06711

image

seungwonpark avatar Oct 28 '19 06:10 seungwonpark

I tried batch size: (32, 128, 256), with similar configuration of this repo, batch 32 was better than others at 220k train step ( 32 > 128 > 256). I haven't tried batch size = 16 yet.

imdanboy avatar Oct 28 '19 10:10 imdanboy

Is it obvious that mel-gan works best at batch size 16? I reminded the mention of authors and now it sounds like they realize there are trade-offs between audio fidelity and faster training time; so if we took more time to train the model with larger batch-size, maybe we can get higher audio quality.

wade3han avatar Dec 05 '19 05:12 wade3han

I just experimentally found batch 16 was best with learning_rate: (discriminator: 4e-4, generator: 1e-4 gan training technique? called ttur was beneficial) if other hyper params are fixed.

imdanboy avatar Dec 06 '19 09:12 imdanboy