contrastive-unpaired-translation
contrastive-unpaired-translation copied to clipboard
Multi-GPU Testing fails
I tried to use the flag "--gpu_ids 0,1" in the command "python test.py --dataroot ./datasets/mscoco17/ --name model-name --CUT_mode CUT --phase train --load_size 786 --crop_size 786 --num_test 20 --gpu_ids 0,1".
With the flag "--gpu_ids 0" or "--gpu_ids 1" it is working properly. But with both gpus i get the appended Traceback.
I downloaded the repository today, so it should include the changes regarding multi-gpu, which were updated 4 days ago.
Traceback (most recent call last) File "test.py", line 56, in <module> model.data_dependent_initialize(data) File "/data/after-final-structure-217/cut-em2coco/cut-em2coco-mgpu/models/cut_model.py", line 105, in data_dependent_initialize self.forward() # compute fake images: G(A) File "/model_folder/models/cut_model.py", line 154, in forward self.fake = self.netG(self.real) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/model_folder/models/networks.py", line 1006, in forward fake = self.model(input) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/padding.py", line 170, in forward return F.pad(input, self.padding, 'reflect') File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 3569, in _pad return torch._C._nn.reflection_pad2d(input, pad) RuntimeError: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [ torch.cuda.FloatTensor{0,3,784,784} ]
It's because --gpu_ids 0,1
specifies that you will use 2 GPUs, but the batch size is still 1. You need to specify a batch size that can be split evenly to two GPUs. For example, adding --batch_size 2
should resolve the issue.
Thanks for your reply! I changed the batch_size to 2, 4, 8 and 16. I am getting for all those values the same Traceback as before. Do you have another idea, where the problem has it's origin?
I'm facing the same problem when I run train.py, it could not run in multiGPU mode.
result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'input'
Here is one of the error msg
I'm facing the same problem when I run train.py, it could not run in multiGPU mode.
result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'input'
Here is one of the error msg
I have the same issue if I want to train CUT on more than two GPUs. Did you find a solution?
Hi there, did you find any solution?