stylegan2-pytorch
stylegan2-pytorch copied to clipboard
autograd error in g_path_regularize in multi-GPUs training
Hi, thanks for your efforts ;) I am trying to deploy the code on multi-GPUs on a single machine instead of distributed training to save time.
Then I met an issue in g_path_regularize: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
I tried several ways but didn't work great. I was wondering
- why using distributed training doesn't have this issue
- how to fix it? Shall it be transferred to a single tensor as in a WGANGP? Is there any other way? Thanks for your response in advance.
Did you use nn.DataParallel? g_path_regularize is not compatible with nn.DataParallel. (It is due to multi gpu gather operations in the nn.DataParallel.)
Hi @rosinality thanks for the great repository. Would you have any hints on how to make g_path_regularize
work with nn.DataParallel
?
@beniz Maybe you can calculate g_path_regularize inside the generator forward, and only return the loss.
@beniz Maybe you can calculate g_path_regularize inside the generator forward, and only return the loss.
Thanks a lot for your effort! I actually encountered exactly the same issue when calling g_path_regularizer w/ DataParallel. I would like to know if you have an implementation of the comments above.
@lychenyoko I haven't tried. Do you need DataParallel instead of DistributedDataParallel? (Because of windows?)
@rosinality Never mind. I have managed to do that and thanks for your advice.
@rosinality @lychenyoko Could you please let me know how you addressed that issue? I am using DistributedDataParallel right now but got stuch at the step of synchronization (https://github.com/rosinality/stylegan2-pytorch/blob/bef283a1c24087da704d16c30abc8e36e63efa0e/train.py#L439).