你好,我在训练web172数据集时报错,显示梯度无法传回,我没有修改过代码(除了一些路径),请问该如何解决。
[W reducer.cpp:1303] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
Traceback (most recent call last):
File "tools/train.py", line 161, in
main()
File "tools/train.py", line 151, in main
train_model(model,
File "/root/mmgeneration/mmgen/apis/train.py", line 207, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/root/mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 285, in run
iter_runner(iter_loaders[i], **kwargs)
File "/root/mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 215, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/root/mmgeneration/mmgen/core/ddp_wrapper.py", line 123, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/root/autodl-tmp/MMGEN-FaceStylor/agilegan/transfer.py", line 414, in train_step
loss_gen, log_vars_g, source_results = self.get_gen_loss(data_dict)
File "/root/autodl-tmp/MMGEN-FaceStylor/agilegan/transfer.py", line 121, in get_gen_loss
loss = loss_module(outputs_dict)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/mmgeneration/mmgen/models/losses/gen_auxiliary_loss.py", line 264, in forward
path_penalty, self.mean_path_length, _ = gen_path_regularizer(
File "/root/mmgeneration/mmgen/models/losses/gen_auxiliary_loss.py", line 102, in gen_path_regularizer
grad = autograd.grad(
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 234, in grad
return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3913) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
我是用的是云服务器,mmcv-full和mmgen的版本是1.6.0和0.7.2,该问题是否与版本有关?

We recommend using English or English & Chinese for issues so that we could have broader discussion.
have met same issue, have you solved this?