whisper-finetune icon indicating copy to clipboard operation
whisper-finetune copied to clipboard

Information On Batch Size And Learning Rate

Open mallorbc opened this issue 1 year ago • 6 comments

The discord link in the README does not work for me.

Do you have any information on what batch size or learning rate to use? I could only find the max learning rate that was used in the paper. Experimentally, I found that too small of a batch size seems to cause issues

What batch size and learning rate do you recommend and why?

mallorbc avatar May 24 '23 02:05 mallorbc

The discord link is active. You might probably need to join the HuggingFace discord server and the ML-4-AUDIO channel within it to gain access. The discussion about learning rates could be found in the same thread. The max learning rates used in the paper can be found from the very final page.

Regarding batch size, it highly depends on the GPUs being used and their memory capacity. While higher batch sizes are recommended, one needs to experiment and fixate on the batch size which gives maximum utilization of GPU memory while not leading to out-of-memory issues. It is to be noted that the batch size in the scripts of this repository refer to the per-GPU batch size.

vasistalodagala avatar May 24 '23 05:05 vasistalodagala

Thanks for great work Was the batch size mentioned here https://huggingface.co/vasista22/whisper-kannada-medium

Total or per gpu?

Theodotus1243 avatar Jun 27 '23 15:06 Theodotus1243

@Theodotus1243 , the batch size for the above mentioned model is the per GPU batch size.

vasistalodagala avatar Jun 27 '23 19:06 vasistalodagala

Thanks How much gpus have you used, to properly reproduce your experience

Theodotus1243 avatar Jun 28 '23 00:06 Theodotus1243

@Theodotus1243 , 8 A-100 GPUs have been used for about a week to train the mentioned model

vasistalodagala avatar Jun 28 '23 00:06 vasistalodagala

What can be the best parameters for A100 machine for finetuning large-v2??

vpssa avatar Mar 17 '24 10:03 vpssa