community-events icon indicating copy to clipboard operation
community-events copied to clipboard

Whisper finetune

Open Macsim2 opened this issue 1 year ago • 1 comments

Hi I'm trying to train whisper fine-tune with multi-gpu and I don't know what RANK to set I just set WORLD_SIZE is numer of gpu and MASTER_ADDR is localhost, MASTER_PORT is idle port When WORLD_SIZE is more than 2 and RANK is set 0, training is hanging Probably it hanged in setting torch.distributed.TCPStore() part..

anyone who solved this problem? let me know hint please

Macsim2 avatar Jun 20 '23 09:06 Macsim2

Hey @Macsim2! You should just be able to launch multi-GPU training using torchrun, as shown here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#multi-gpu-whisper-training

Let me know if you encounter any difficulties!

sanchit-gandhi avatar Dec 07 '23 13:12 sanchit-gandhi