motion-cosegmentation icon indicating copy to clipboard operation
motion-cosegmentation copied to clipboard

Problem with loss function when training with 512x512 data

Open playmakerbugger opened this issue 3 years ago • 2 comments

I am training on a 512 x 512 set of data, using 4 gpus. I am having a stall during the loss process where it locks during the calculation of the loss. It gives a user warning of "was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector."

Is there a way to split the loss function to work on multiple gpus with the DataParallelwithCallback? I noticed that the loss is only calculating on gpu 0.

playmakerbugger avatar Jan 10 '22 13:01 playmakerbugger

I guess bs is too large.

AliaksandrSiarohin avatar Jan 10 '22 14:01 AliaksandrSiarohin

I was using a batch size of 6. Should I lower it? Because it says it even on a batch size as low as 2.

playmakerbugger avatar Jan 10 '22 17:01 playmakerbugger