SAM-Adapter-PyTorch train not succeed

train not succeed

Open skycat88 opened this issue 2 years ago • 16 comments

size mismatch for image_encoder.blocks.23.mlp.lin1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([5120, 1280]). size mismatch for image_encoder.blocks.23.mlp.lin1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([5120]). size mismatch for image_encoder.blocks.23.mlp.lin2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for image_encoder.blocks.23.mlp.lin2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for image_encoder.neck.0.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1280, 1, 1]). ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3207998) of binary: /home/syy/anaconda3/envs/SAM_Adapter/bin/python Traceback (most recent call last): File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in main() File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/syy/anaconda3/envs/SAM_Adapter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

============================================================ train.py FAILED

Failures: [1]: time : 2023-04-24_19:02:47 host : vip rank : 1 (local_rank: 1) exitcode : 1 (pid: 3208003) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-04-24_19:02:47 host : vip rank : 2 (local_rank: 2) exitcode : 1 (pid: 3208005) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-04-24_19:02:47 host : vip rank : 3 (local_rank: 3) exitcode : 1 (pid: 3208011) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-04-24_19:02:47 host : vip rank : 0 (local_rank: 0) exitcode : 1 (pid: 3207998) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

(SAM_Adapter) syy@vip:~/code/data_auto/SAM-Adapter-PyTorch$ python -m torch.distributed.launch --nnodes 1 --nproc_per_node 4 train.py --config configs/demo.yaml

1、环境版本按照要求配置 readme 中的 loadddptrain.py 没有，使用的是train, 2、下载的数据是cmos,, 请问数据处理有其他要求吗训练实验用的数据，只有下面的伪装物检测数据，制作的1500 CAMO-COCO-V.1.0-CVIU2019\Camouflage\Images GT

Apr 25 '23 06:04 skycat88

SAM-Adapter-PyTorch SAM-Adapter-PyTorch copied to clipboard

train not succeed

============================================================ train.py FAILED

Root Cause (first observed failure): [0]: time : 2023-04-24_19:02:47 host : vip rank : 0 (local_rank: 0) exitcode : 1 (pid: 3207998) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

SAM-Adapter-PyTorch
SAM-Adapter-PyTorch copied to clipboard