query-selected-attention icon indicating copy to clipboard operation
query-selected-attention copied to clipboard

Multi-GPU Training Issue

Open ShenZheng2000 opened this issue 2 years ago • 3 comments

Hi, authors! I encounter an issue when training with 8 GPUs. The dataset has been created, and the model has been initialized. However, there is a TypeError shown as below. The training did go well with single GPU training, though.

image

Could you help me solving this issue? Thanks!

ShenZheng2000 avatar Feb 02 '23 05:02 ShenZheng2000

I also have the issue with multi GPU training, the error I got was: "runtimeerror: non-empty 3d or 4d (batch mode) tensor expected for input, but got: [ torch.cuda.floattensor{0,256,64,64} ]"

RL-arch avatar Mar 01 '23 17:03 RL-arch

Hi, I have the same problem as you, did you solve it?

jpzhai avatar Jul 09 '23 13:07 jpzhai

I found a solution by modifying train.py.

Replace this

            model.optimize_parameters()   # calculate loss functions, get gradients, update network weights

with this

            model.set_input(data)
            model.optimize_parameters()   # calculate loss functions, get gradients, update network weights

ShenZheng2000 avatar Aug 04 '23 18:08 ShenZheng2000