whisper-finetune Information On Batch Size And Learning Rate

Information On Batch Size And Learning Rate

Open mallorbc opened this issue 1 year ago • 6 comments

The discord link in the README does not work for me.

Do you have any information on what batch size or learning rate to use? I could only find the max learning rate that was used in the paper. Experimentally, I found that too small of a batch size seems to cause issues

What batch size and learning rate do you recommend and why?

May 24 '23 02:05 mallorbc

The discord link is active. You might probably need to join the HuggingFace discord server and the ML-4-AUDIO channel within it to gain access. The discussion about learning rates could be found in the same thread. The max learning rates used in the paper can be found from the very final page.

Regarding batch size, it highly depends on the GPUs being used and their memory capacity. While higher batch sizes are recommended, one needs to experiment and fixate on the batch size which gives maximum utilization of GPU memory while not leading to out-of-memory issues. It is to be noted that the batch size in the scripts of this repository refer to the per-GPU batch size.

May 24 '23 05:05 vasistalodagala

Thanks for great work Was the batch size mentioned here https://huggingface.co/vasista22/whisper-kannada-medium

Total or per gpu?

Jun 27 '23 15:06 Theodotus1243

@Theodotus1243 , the batch size for the above mentioned model is the per GPU batch size.

Jun 27 '23 19:06 vasistalodagala

Thanks How much gpus have you used, to properly reproduce your experience

Jun 28 '23 00:06 Theodotus1243

@Theodotus1243 , 8 A-100 GPUs have been used for about a week to train the mentioned model

Jun 28 '23 00:06 vasistalodagala

What can be the best parameters for A100 machine for finetuning large-v2??

Mar 17 '24 10:03 vpssa

whisper-finetune whisper-finetune copied to clipboard

Information On Batch Size And Learning Rate

whisper-finetune
whisper-finetune copied to clipboard