community-events
community-events copied to clipboard
Whisper finetune
Hi I'm trying to train whisper fine-tune with multi-gpu
and I don't know what RANK
to set
I just set WORLD_SIZE
is numer of gpu and MASTER_ADDR
is localhost, MASTER_PORT
is idle port
When WORLD_SIZE
is more than 2 and RANK
is set 0, training is hanging
Probably it hanged in setting torch.distributed.TCPStore() part..
anyone who solved this problem? let me know hint please
Hey @Macsim2! You should just be able to launch multi-GPU training using torchrun
, as shown here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#multi-gpu-whisper-training
Let me know if you encounter any difficulties!