Decent icon indicating copy to clipboard operation
Decent copied to clipboard

Multi GPU Training Issue

Open ShenZheng2000 opened this issue 2 years ago • 1 comments

Hello, authors! Thanks for your excellent work.

I have trouble with multi-GPU training. My command line looks like this:

python train.py --dataroot $dataset_path--name $model_name--gpu 0,1,2,3 --batch_size 1

And the error is below:

Traceback (most recent call last):
  File "/home/shen/Rain/Methods/Decent/train.py", line 49, in <module>
    model.data_dependent_initialize(data)
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 99, in data_dependent_initialize
    self.compute_F_loss().backward()                   # calculate graidents for F
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 189, in compute_F_loss
    assert len(log_prob_a) == self.opt.batch_size * self.opt.num_patches
AssertionError

I print the values below for debugging.

print(f"{len(log_prob_a)} != {self.opt.batch_size} * {self.opt.num_patches}")

which gives me

0 ! = 1 * 256

Since len(log_prob_a) is 0, we get an empty list for log_prob_a in multi-GPU training.

Do you encounter this issue when training your models? How to solve this issue?

ShenZheng2000 avatar Jan 31 '23 03:01 ShenZheng2000