blog icon indicating copy to clipboard operation
blog copied to clipboard

how to scale fine-tuning whisper in English?

Open jsteinberg-rbi opened this issue 1 year ago • 1 comments

I'm attempting to fine-tune whisper using the excellent hugging face tut: https://huggingface.co/blog/fine-tune-whisper. The delta between the tut's case and my case is that I am using English which has 1M more test cases (and also I'm using big GPUs so I am using whisper-large-v3).

No matter how much compute I throw at the core data preparation step (e.g. take a look at num_proc):

common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=108)

I still only prepare the data at about 30 examples / s. For 1M examples this doesn't scale. My last test was on an 8 GPU 112 vCPU instance and still there was no change. Indeed htop shows that all 112 of my vCPUs are engaged, but the actual prep speed remains flat across all compute types. The only thing I haven't tried is crazy fast storage like NVMe, which I'm going to do, but I have a feeling it has to do with either the datasets library configuration or something else. I've never had problems with GPUs or whisper previously so I'm a bit baffled as to what the issue could. I've followed the tutorial to a 't' except for changing the language to en, whisper to whisper-large-v3 and num_proc to higher parallels. Any insight would be greatly appreciated!

jsteinberg-rbi avatar Nov 22 '23 22:11 jsteinberg-rbi

Did you figure this out?

bfortuner avatar Mar 10 '24 06:03 bfortuner