Please use English or English & Chinese for issues so that we could have broader discussion.
我用自己的数据集训练模型
!bash tools/dist_train.sh configs/agilegan/agile_transfer_photo2cartoon256x256_zplus_lpips0.5_freezeD5_ada_bs4x2_lr_2e-3_1600iter.py 1 --work-dir ./work_dirs/experiments/experiments_name
Traceback (most recent call last):
File "tools/train.py", line 161, in
main()
File "tools/train.py", line 157, in main
meta=meta)
File "/content/mmgeneration/mmgen/apis/train.py", line 208, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/content/mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 285, in run
iter_runner(iter_loaders[i], **kwargs)
File "/content/mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 215, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/content/mmgeneration/mmgen/core/ddp_wrapper.py", line 123, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/content/MMGEN-FaceStylor/agilegan/transfer.py", line 414, in train_step
loss_gen, log_vars_g, source_results = self.get_gen_loss(data_dict)
File "/content/MMGEN-FaceStylor/agilegan/transfer.py", line 121, in get_gen_loss
loss = loss_module(outputs_dict)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/mmgeneration/mmgen/models/losses/gen_auxiliary_loss.py", line 261, in forward
**kwargs)
File "/content/mmgeneration/mmgen/models/losses/gen_auxiliary_loss.py", line 104, in gen_path_regularizer
only_inputs=True)[0]
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 192, in grad
inputs, allow_unused)
RuntimeError: module 'torch.backends.cudnn' has no attribute 'allow_tf32'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 261, in
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/agilegan/agile_transfer_photo2cartoon256x256_zplus_lpips0.5_freezeD5_ada_bs4x2_lr_2e-3_1600iter.py', '--launcher', 'pytorch', '--work-dir', './work_dirs/experiments/experiments_name']' returned non-zero exit status 1.