blog
blog copied to clipboard
how to scale fine-tuning whisper in English?
I'm attempting to fine-tune whisper using the excellent hugging face tut: https://huggingface.co/blog/fine-tune-whisper. The delta between the tut's case and my case is that I am using English which has 1M more test cases (and also I'm using big GPUs so I am using whisper-large-v3
).
No matter how much compute I throw at the core data preparation step (e.g. take a look at num_proc
):
common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=108)
I still only prepare the data at about 30 examples / s. For 1M examples this doesn't scale. My last test was on an 8 GPU 112 vCPU instance and still there was no change. Indeed htop
shows that all 112 of my vCPUs are engaged, but the actual prep speed remains flat across all compute types. The only thing I haven't tried is crazy fast storage like NVMe, which I'm going to do, but I have a feeling it has to do with either the datasets
library configuration or something else. I've never had problems with GPUs or whisper previously so I'm a bit baffled as to what the issue could. I've followed the tutorial to a 't' except for changing the language to en
, whisper to whisper-large-v3
and num_proc
to higher parallels. Any insight would be greatly appreciated!
Did you figure this out?