mvsplat icon indicating copy to clipboard operation
mvsplat copied to clipboard

How to start multi-gpus training in a single machine

Open kevinhuangxf opened this issue 11 months ago • 1 comments

Thanks for the excellent work!

I encounter a problem of how to start multi-gpu training. I have 8 gpus but each I ran the training command line I can only start one GPU training:

Image

Image

I use this command:

python -m src.main +experiment=re10k data_loader.train.batch_size=14

Does it mean even I train on single node with multiple GPUs, I still need to use slurm to run multi gpus training?

kevinhuangxf avatar Jan 26 '25 08:01 kevinhuangxf

Hi @kevinhuangxf, thanks for your appreciation. Normally, the current setting should automatically utilize all available GPUs for training. I'm not sure what might be causing this issue. You could try explicitly specifying the training devices to use all GPUs by following the instructions here.

donydchen avatar Feb 18 '25 02:02 donydchen