litgpt
litgpt copied to clipboard
Error loading converted litgpt checkpoints in `pytorch_model.bin` format using huggingface `AutoModelForCausalLM`
Hi, we're using the litgpt framework to train models and then would like to export them to huggingface format for continued tuning and evaluation.
The steps we're using after completing training are:
-
scripts/convert_pretrained_checkpoint.py
to "finalize" the model -
scripts/convert_lit_checkpoint.py
to conform it to the huggingface saved model format - Load using
transformers.AutoModelForCausalLM.from_pretrained("/path/to/converted/checkpoint/dir")
The actual load step 3. throws an error because it tries to call torch.load(checkpoint_file, weights_only=True)
internally when it sees that no safetensors format checkpoint is available: transformers/modeling_utils.py#L529-L535
This can be bypassed by setting weights_only=False
but this is not the desired solution, rather, it would be great if there was a way to export a trained litgpt model to model.safetensors
format directly, rather than to the pytorch_model.bin
file format.
What do you think?
I couldn't find any mention of this hiccup within litgpt, or elsewhere really actually - the only "safetensors" related things here are on the scripts/download.py
side for bringing hf safetensors format models into litgpt.
I think exporting to .safetensors
would be nice in the future. In the meantime, to address your issue, you could load it via state_dicts
-- I just had wanted to try something similar and shared the approach in the tutorial here (scroll to the very bottom): https://github.com/Lightning-AI/litgpt/blob/main/tutorials/convert_lit_models.md#a-finetuning-and-conversion-tutorial
Hi! weights_only=True
shouldn't have anything to do with safetensors. Can you share the precise error that you get? There should only be weights and primitives in the state dict
Thanks for the interim sol'n @rasbt , I'll try that out!
@carmocca So this is the stacktrace from the error result of the torch.load
operation within the transformers loading logic linked above.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[omitted]/python3.11/site-packages/torch/serialization.py", line 1013, in load
raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 149
(It's not a "safetensors issue", I was just noting that their control flow falls back to this loading variant if a model.safetensors
file can't be found at the provided path.")
So we need to find out what's causing the "Unsupported operand 149" to know if it's litgpt saving something that we shouldn't. Would it be possible for you to share this checkpoint? You can omit the tensor data if that's proprietary or private.
@rasbt can you please also guide on the #1095 as well. Essentially, it is similar problem but your approach would not work as I have different config such as n_layer
, n_head
, n_embd
.
Got the same problem After converting finetuned model(qlora) from litgpt to hf format, when load hf format model will get the error:
Weights only load failed. Re-running torch.load
with weights_only
set to False
will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 149
But this only happens on transformers version higher than 4.36.0. When I use version 4.34.1, the converted hf format model will be loaded normally.
my finetuned model: Codellama-7b-hf-instruct
If needed, I could share the finetuned checkpoint.