Multi-GPU support feature disabled in QM9's uncond generation.

Open Littletoyone opened this issue 1 year ago • 1 comments

Hello GRAPH-0,I'm glad to see your outstanding work again after CDGS, I was still very lost when I realized that CDGS can't run with multiple GPUs. I was happy from the bottom of my heart when I saw this JODO saying that I could run with multiple devices, but after I tried CUDA_VISIBLE_DEVICES=0,1 on Unconditional Generation of QM9 Training like what Readme said we can Set GPU_id with CUDA_VISIBLE_DEVICES, support multi GPUs.like follow:

CUDA_VISIBLE_DEVICES=0,1 python main.py --config configs/vpsde_qm9_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_qm9_jodo

it didn't work, instead it was prompting me: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

I followed the root of the error report and found that the problem is from the EquivariantMixBlock class of mol_gnn. I tried to modify it to make all networks todevice, but none of them worked, so what should I do?

Apr 25 '24 13:04 Littletoyone

The multi-GPU error is likely due to the current code not reshaping edge_mask, and Batch should be used as the first dimension.

Nov 11 '24 15:11 GRAPH-0