scvi-tools icon indicating copy to clipboard operation
scvi-tools copied to clipboard

multi-GPU vs. single GPU - scvi 1.1.x branch

Open zhenxingjian opened this issue 2 years ago • 3 comments

I've successfully installed the scvi==1.1.x (main branch) and tested that I can train the model on 1 GPU. However, when using multi-GPU, here's the error I'm facing at.

Batch_size = 512. for 1 GPU: x.shape = (512, 1178) for 2 GPUs: x.shape = (1, 512, 1178)

This causes almost everything cannot run in the code due to the dimension mismatch. For example, one_hot function or FCLayers.

Do you have a quick fix for this or I should manually change everything of the dimension in the code (maybe x.squeeze(0) in the outer-most nn.Module) to match it?

zhenxingjian avatar Dec 08 '23 21:12 zhenxingjian

Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.

martinkim0 avatar Dec 08 '23 23:12 martinkim0

Hi @martinkim0 scVI is working with multiple samples (like n_samples_per_mc_run). Those look similar like the multi-GPU structure (n_samples, n_batch, n_genes). Quite some other functions like scANVI are not handling n_samples correctly (dimension errors). It's major work to adapt this. I was so confused by it today and made scANVI instead work with n_samples=1.

canergen avatar Dec 09 '23 01:12 canergen

Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.

I'm following the setup of multiVI. If you've tested that scVI is working with multi-GPU training, I can try to modify the code following the same setup in scVI from my end to see if it can support multi-GPU training.

zhenxingjian avatar Dec 11 '23 04:12 zhenxingjian