starcoder
starcoder copied to clipboard
TypeError: expected str, bytes or os.PathLike object, not NoneType
I tried to fine-tune using the commands provided in the README and encountered the aforementioned error. For specific details, please refer to my wandb log.
Correct me if I am wrong but your training seems to have gone well. The problem should come after trainer.train(). Do you have a checkpoints folder with checkpoint-1000 in after your training?
Correct me if I am wrong but your training seems to have gone well. The problem should come after
trainer.train(). Do you have acheckpointsfolder with checkpoint-1000 in after your training?
Yes, I think so
(base) root@vgpu-test-codellm-idle-20230614-serial-0:/mnt/nfs/zhangshaoang.p/starcoder# tree checkpoints
checkpoints
└── checkpoint-1000
├── adapter_config.json
├── adapter_model.bin
├── optimizer.pt
├── pytorch_model.bin
├── README.md
├── rng_state.pth
├── scheduler.pt
├── trainer_state.json
└── training_args.bin
So what happened?
The issue is due to the callback which allows to load the best checkpoint. The callback is used but load_best_model_at_end is set to False. I'll look into this.
This may helpful, put it in SavePeftModelCallback
if state.best_model_checkpoint is None:
print(f"Setting best_model_checkpoint to {checkpoint_folder}")
state.best_model_checkpoint = checkpoint_folder
elif state.best_model_checkpoint.endswith(checkpoint_folder):
print(f"Updating best_model_checkpoint to {checkpoint_folder}")
state.best_model_checkpoint = checkpoint_folder
I got a similar error - TypeError: expected str, bytes or os.PathLike object, not NoneType. It seemed to output these thing son console - Starting main loop Training... {'loss': 0.6581, 'learning_rate': 0.0001, 'epoch': 0.5} {'eval_loss': 1.4904661178588867, 'eval_runtime': 7.503, 'eval_samples_per_second': 0.8, 'eval_steps_per_second': 0.8, 'epoch': 0.5} {'loss': 0.068, 'learning_rate': 0.0, 'epoch': 1.0} {'eval_loss': 1.9775662422180176, 'eval_runtime': 7.4805, 'eval_samples_per_second': 0.802, 'eval_steps_per_second': 0.802, 'epoch': 1.0} {'train_runtime': 28299.0398, 'train_samples_per_second': 0.113, 'train_steps_per_second': 0.007, 'train_loss': 0.3630624198913574, 'epoch': 1.0} Loading best peft model from None (score: None).
However, the script stopped after running into this error.
My checkpoints folder is also empty.
I think this might be the reason. " This may helpful, put it in SavePeftModelCallback
if state.best_model_checkpoint is None:
print(f"Setting best_model_checkpoint to {checkpoint_folder}")
state.best_model_checkpoint = checkpoint_folder
elif state.best_model_checkpoint.endswith(checkpoint_folder):
print(f"Updating best_model_checkpoint to {checkpoint_folder}")
state.best_model_checkpoint = checkpoint_folder
"