multi-GPU vs. single GPU - scvi 1.1.x branch
I've successfully installed the scvi==1.1.x (main branch) and tested that I can train the model on 1 GPU. However, when using multi-GPU, here's the error I'm facing at.
Batch_size = 512. for 1 GPU: x.shape = (512, 1178) for 2 GPUs: x.shape = (1, 512, 1178)
This causes almost everything cannot run in the code due to the dimension mismatch. For example, one_hot function or FCLayers.
Do you have a quick fix for this or I should manually change everything of the dimension in the code (maybe x.squeeze(0) in the outer-most nn.Module) to match it?
Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.
Hi @martinkim0 scVI is working with multiple samples (like n_samples_per_mc_run). Those look similar like the multi-GPU structure (n_samples, n_batch, n_genes). Quite some other functions like scANVI are not handling n_samples correctly (dimension errors). It's major work to adapt this. I was so confused by it today and made scANVI instead work with n_samples=1.
Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.
I'm following the setup of multiVI. If you've tested that scVI is working with multi-GPU training, I can try to modify the code following the same setup in scVI from my end to see if it can support multi-GPU training.