Decent
Decent copied to clipboard
Multi GPU Training Issue
Hello, authors! Thanks for your excellent work.
I have trouble with multi-GPU training. My command line looks like this:
python train.py --dataroot $dataset_path--name $model_name--gpu 0,1,2,3 --batch_size 1
And the error is below:
Traceback (most recent call last):
File "/home/shen/Rain/Methods/Decent/train.py", line 49, in <module>
model.data_dependent_initialize(data)
File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 99, in data_dependent_initialize
self.compute_F_loss().backward() # calculate graidents for F
File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 189, in compute_F_loss
assert len(log_prob_a) == self.opt.batch_size * self.opt.num_patches
AssertionError
I print the values below for debugging.
print(f"{len(log_prob_a)} != {self.opt.batch_size} * {self.opt.num_patches}")
which gives me
0 ! = 1 * 256
Since len(log_prob_a)
is 0, we get an empty list for log_prob_a in multi-GPU training.
Do you encounter this issue when training your models? How to solve this issue?