AeDet icon indicating copy to clipboard operation
AeDet copied to clipboard

运行程序遇到以下报错,怎么解决呀?

Open ww249 opened this issue 2 years ago • 4 comments

Missing logger folder: outputs/aedet_lss_r50_256x704_128x128_24e_2key/lightning_logs Restoring states from the checkpoint path at /home/ww/Coding/AeDet/data/nuscenes/nuscenes_12hz_infos_train.pkl Traceback (most recent call last): File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 109, in run_cli() File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 105, in run_cli main(args) File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 75, in main trainer.fit(model, ckpt_path=args.ckpt_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit self._call_and_handle_interrupt( File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1180, in _run self._restore_modules_and_callbacks(ckpt_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1140, in _restore_modules_and_callbacks self._checkpoint_connector.resume_start(checkpoint_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 84, in resume_start self._loaded_checkpoint = self._load_and_validate_checkpoint(checkpoint_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 88, in _load_and_validate_checkpoint loaded_checkpoint = self.trainer.strategy.load_checkpoint(checkpoint_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 316, in load_checkpoint return self.checkpoint_io.load_checkpoint(checkpoint_path) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/plugins/io/torch_plugin.py", line 85, in load_checkpoint return pl_load(path, map_location=map_location) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py", line 47, in load return torch.load(f, map_location=map_location) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/torch/serialization.py", line 780, in _legacy_load raise RuntimeError("Invalid magic number; corrupt file?") RuntimeError: Invalid magic number; corrupt file?

ww249 avatar Sep 04 '23 02:09 ww249

What is your training script?

fcjian avatar Sep 04 '23 12:09 fcjian

sudo /home/ww/.conda/envs/aedet2/bin/python /home/ww/Coding/AeDet/exps/aedet/aedet_lss_r101_512x1408_256x256_24e_2key.py --amp_backend native -b 8 --gpus 1 --ckpt_path /home/ww/Coding/AeDet/data/nuScenes/nuscenes_12hz_infos_train.pkl

ww249 avatar Sep 11 '23 02:09 ww249

--ckpt_path means the path of the model checkpoint, and you should remove it, namely: sudo /home/ww/.conda/envs/aedet2/bin/python /home/ww/Coding/AeDet/exps/aedet/aedet_lss_r101_512x1408_256x256_24e_2key.py --amp_backend native -b 8 --gpus 1

fcjian avatar Sep 11 '23 02:09 fcjian

very very very appreciate it!! this problem has been solved. from now on, i'll order food delivery by Meittuan. thanks again!!

ww249 avatar Sep 11 '23 03:09 ww249