image-relighting icon indicating copy to clipboard operation
image-relighting copied to clipboard

when train a model. having the following issue.

Open LIN-ALOHA opened this issue 1 year ago • 0 comments

Training epoch # 1 / 10 /opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/autograd/init.py:173: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error: File "train.py", line 122, in train(model, optimizer, dataloader, 0) File "train.py", line 86, in train I_tp_batch, L_sp_batch = model.forward(I_sbatch, L_tbatch, skip_count) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 191, in forward feat, out_light = self.HG3(feat, target_light, 0, skip_count) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 85, in forward out_lower, out_middle = self.middle(out_lower, light, count + 1, skip_count) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 85, in forward out_lower, out_middle = self.middle(out_lower, light, count + 1, skip_count) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 85, in forward out_lower, out_middle = self.middle(out_lower, light, count + 1, skip_count) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 84, in forward out_lower = self.low1(out_lower) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/notebook/code/group/linyuzhou/relight/image-relighting-master/model/model.py", line 49, in forward out = F.relu(out) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/nn/functional.py", line 1442, in relu result = torch.relu(input) (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass Traceback (most recent call last): File "train.py", line 122, in train(model, optimizer, dataloader, 0) File "train.py", line 96, in train loss.backward() File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/opt/conda/envs/diffusionrig/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [10, 155, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

LIN-ALOHA avatar Jul 05 '23 03:07 LIN-ALOHA