RealBasicVSR icon indicating copy to clipboard operation
RealBasicVSR copied to clipboard

GPU num for training

Open Xiao-R-Y opened this issue 1 year ago • 0 comments

Thanks for your excellent work, I've got a problem when training with only one GPU, could you please give me some guidance on non-distributed learning commands, thank you.

THE logs are as follows: Training command is /home/zhangyang/envs/anaconda3/envs/realVSR/bin/python -m torch.distributed.launch --nproc_per_node=1 --master_port=21932 /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py --launcher pytorch. /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. 'On January 1, 2023, MMCV will release v2.0.0, in which it will remove ' /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting OMP_NUM_THREADS environment variable for each process ' /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting MKL_NUM_THREADS environment variable for each process ' Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py", line 171, in main() File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py", line 108, in main cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/utils/config.py", line 596, in dump f.write(self.pretty_text) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/utils/config.py", line 508, in pretty_text text, _ = FormatCode(text, style_config=yapf_style, verify=True) TypeError: FormatCode() got an unexpected keyword argument 'verify' Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/zhangyang/envs/anaconda3/envs/realVSR/bin/python', '-u', '/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py', '--local_rank=0', 'configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py', '--launcher', 'pytorch']' returned non-zero exit status 1. Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/bin/mim", line 8, in sys.exit(cli()) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mim/commands/train.py", line 111, in cli other_args=other_args) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mim/commands/train.py", line 262, in train cmd, env=dict(os.environ, MASTER_PORT=str(port))) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/subprocess.py", line 328, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/home/zhangyang/envs/anaconda3/envs/realVSR/bin/python', '-m', 'torch.distributed.launch', '--nproc_per_node=1', '--master_port=21932', '/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py', 'configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Xiao-R-Y avatar Nov 07 '23 05:11 Xiao-R-Y