CoTNet icon indicating copy to clipboard operation
CoTNet copied to clipboard

Hello, thank you very much for the code you provided, but I have such an error in operation, may I ask how to solve it

Open ThelilinNB opened this issue 1 year ago • 0 comments

[INFO: 2023-07-30 00:36:50,416] Model cotnet50 created, flops_count: 3.29 GMac, param count: 22.22 M [INFO: 2023-07-30 00:36:50,474] AMP not enabled. Training in float32. [INFO: 2023-07-30 00:36:50,474] Using native Torch DistributedDataParallel. Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError Traceback (most recent call last): File "/data/master21/lipl/CoTNet-master/train.py", line 379, in main() File "/data/master21/lipl/CoTNet-master/train.py", line 321, in main loader_train, mixup_active, mixup_fn = setup_loader(data_config) File "/data/master21/lipl/CoTNet-master/train.py", line 145, in setup_loader assert os.path.exists(train_dir) AssertionError ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 59360) of binary: /home/lipl/anaconda3/envs/dot/bin/python Traceback (most recent call last): File "/home/lipl/anaconda3/envs/dot/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/lipl/anaconda3/envs/dot/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/lipl/anaconda3/envs/dot/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

ThelilinNB avatar Jul 29 '23 08:07 ThelilinNB