unsloth
unsloth copied to clipboard
Missing `_unsloth_temporary_saved_buffers`
I'm getting the following error. I think it's probably because I'm running two training runs on the same machine which might try to create/delete the temporary file around the same time, so that the one that lags slightly behind can't find the temporary file anymore. I haven't validated that's what's happening, but hope that detail is helpful!
Traceback (most recent call last):
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/pdb.py", line 1723, in main
pdb._runscript(mainpyfile)
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/pdb.py", line 1583, in _runscript
self.run(statement)
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/bdb.py", line 598, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/home/jessy/projects/llm-objectives/sandbox/multistep_exp_faster.py", line 459, in <module>
multistep_exp(args)
File "/home/jessy/projects/llm-objectives/sandbox/multistep_exp_faster.py", line 238, in multistep_exp
model = train_model(args, model, tokenizer, next_dataset, logdir, model_save_path)
File "/home/jessy/projects/llm-objectives/sandbox/multistep_exp_faster.py", line 426, in train_model
model.save_pretrained_merged(model_save_path, tokenizer, save_method = "merged_16bit")
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 980, in unsloth_save_pretrained_merged
unsloth_save_model(**arguments)
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 632, in unsloth_save_model
shutil.rmtree(temporary_location)
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/shutil.py", line 715, in rmtree
onerror(os.lstat, path, sys.exc_info())
File "/home/jessy/.miniconda3/envs/unsloth/lib/python3.10/shutil.py", line 713, in rmtree
orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '_unsloth_temporary_saved_buffers'
@jlin816 Oh thanks for that!! Hmm I might leave the folder as is then! I'll add a check to not randomnly delete the folder :)
Thanks! Does this potentially cause any issues with running two jobs on the same machine (eg mixing up checkpoint data somehow)?