second.pytorch icon indicating copy to clipboard operation
second.pytorch copied to clipboard

Error in save model during training

Open godspeed1989 opened this issue 6 years ago • 6 comments

I try to train model with cmd

python pytorch/train.py train --config_path=./configs/car_test.config --model_dir=./predicts

but encountered following error message

  File "pytorch/train.py", line 396, in train
    net.get_global_step())
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 173, in save_models
    save(model_dir, model, name, global_step, max_to_keep, keep_latest)
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 107, in save
    os.remove(str(Path(model_dir) / ckpt_to_delete))
FileNotFoundError: [Errno 2] No such file or directory: 'predicts/predicts/voxelnet-2487.tckpt'

The path seems incorrect which leads to the error of removing.

godspeed1989 avatar Oct 22 '18 03:10 godspeed1989

Does the directory, ./predicts, exist or not? Make sure "/path/to/model_dir" doesn't exist if you want to train new model. A new directory will be created if the model_dir doesn't exist, otherwise will read checkpoints in it.

Benzlxs avatar Oct 23 '18 04:10 Benzlxs

The directory ./predicts is a new directory created by train.py But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

godspeed1989 avatar Oct 23 '18 07:10 godspeed1989

try to use absolute path for model dir, i will attempt to fix relative path problem later.

traveller59 avatar Oct 23 '18 12:10 traveller59

The directory ./predicts is a new directory created by train.py But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.

gujiaqivadin avatar Jul 22 '19 02:07 gujiaqivadin

The directory ./predicts is a new directory created by train.py But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.

hi,i meet the same problem ,have you solved this problem? how to deal with it ? please give me an answer ,thanks

zmlll avatar Jan 02 '21 07:01 zmlll

I meet the same problem and solved it, record to help others.

modify line 100 of second.pytorch/torchplus/train/checkpoint.py from ckpt_to_delete = all_ckpts.pop(0) to ckpt_to_delete = Path(all_ckpts.pop(0)).name

huangbinz avatar Jun 04 '21 05:06 huangbinz