second.pytorch
second.pytorch copied to clipboard
Error in save model during training
I try to train model with cmd
python pytorch/train.py train --config_path=./configs/car_test.config --model_dir=./predicts
but encountered following error message
File "pytorch/train.py", line 396, in train
net.get_global_step())
File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 173, in save_models
save(model_dir, model, name, global_step, max_to_keep, keep_latest)
File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 107, in save
os.remove(str(Path(model_dir) / ckpt_to_delete))
FileNotFoundError: [Errno 2] No such file or directory: 'predicts/predicts/voxelnet-2487.tckpt'
The path seems incorrect which leads to the error of removing.
Does the directory, ./predicts, exist or not? Make sure "/path/to/model_dir" doesn't exist if you want to train new model. A new directory will be created if the model_dir doesn't exist, otherwise will read checkpoints in it.
The directory ./predicts is a new directory created by train.py
But the path predicts/predicts/voxelnet-2487.tckpt
is incorrect. The correct path should be predicts/voxelnet-2487.tckpt
.
try to use absolute path for model dir, i will attempt to fix relative path problem later.
The directory ./predicts is a new directory created by train.py But the path
predicts/predicts/voxelnet-2487.tckpt
is incorrect. The correct path should bepredicts/voxelnet-2487.tckpt
.
Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.
The directory ./predicts is a new directory created by train.py But the path
predicts/predicts/voxelnet-2487.tckpt
is incorrect. The correct path should bepredicts/voxelnet-2487.tckpt
.Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.
hi,i meet the same problem ,have you solved this problem? how to deal with it ? please give me an answer ,thanks
I meet the same problem and solved it, record to help others.
modify line 100 of second.pytorch/torchplus/train/checkpoint.py
from ckpt_to_delete = all_ckpts.pop(0)
to ckpt_to_delete = Path(all_ckpts.pop(0)).name