训练总步数: 1562 每步耗时: 0.4430051675679284 最新每步耗时: 0.44242509608079467 最新每步loss 3.1989
Traceback (most recent call last):
File "/data1/code/github_code/chatbot/execute.py", line 165, in
train()
File "/data1/code/github_code/chatbot/execute.py", line 114, in train
torch.save({'modelA_state_dict': encoder.state_dict(),
File "/home/tt/anaconda3/envs/py38_yolov5_5x/lib/python3.8/site-packages/torch/serialization.py", line 422, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/home/tt/anaconda3/envs/py38_yolov5_5x/lib/python3.8/site-packages/torch/serialization.py", line 309, in _open_zipfile_writer
return container(name_or_buffer)
File "/home/tt/anaconda3/envs/py38_yolov5_5x/lib/python3.8/site-packages/torch/serialization.py", line 287, in init
super(_open_zipfile_writer_file, self).init(torch._C.PyTorchFileWriter(str(name)))
RuntimeError: [enforce fail at inline_container.cc:354] . invalid file name: model_data/.pt
execute.py中checkpoint_prefix = os.path.join(checkpoint_dir, "model_data.pt")拼接的路径有问题。我创建了和config同级的model_data目录,然后在这个目录里存模型解决
可以将checkpoint_prefix = os.path.join(checkpoint_dir, "model_data.pt")改成checkpoint_prefix = checkpoint_dir + ".pt"