litgpt Error loading converted litgpt checkpoints in `pytorch_model.bin` format using huggingface `AutoModelForCausalLM`

Hi, we're using the litgpt framework to train models and then would like to export them to huggingface format for continued tuning and evaluation.

The steps we're using after completing training are:

scripts/convert_pretrained_checkpoint.py to "finalize" the model
scripts/convert_lit_checkpoint.py to conform it to the huggingface saved model format
Load using transformers.AutoModelForCausalLM.from_pretrained("/path/to/converted/checkpoint/dir")

The actual load step 3. throws an error because it tries to call torch.load(checkpoint_file, weights_only=True) internally when it sees that no safetensors format checkpoint is available: transformers/modeling_utils.py#L529-L535

This can be bypassed by setting weights_only=False but this is not the desired solution, rather, it would be great if there was a way to export a trained litgpt model to model.safetensors format directly, rather than to the pytorch_model.bin file format. What do you think?

I couldn't find any mention of this hiccup within litgpt, or elsewhere really actually - the only "safetensors" related things here are on the scripts/download.py side for bringing hf safetensors format models into litgpt.

Mar 11 '24 22:03 jwkirchenbauer

I think exporting to .safetensors would be nice in the future. In the meantime, to address your issue, you could load it via state_dicts -- I just had wanted to try something similar and shared the approach in the tutorial here (scroll to the very bottom): https://github.com/Lightning-AI/litgpt/blob/main/tutorials/convert_lit_models.md#a-finetuning-and-conversion-tutorial

Mar 11 '24 22:03 rasbt

Hi! weights_only=True shouldn't have anything to do with safetensors. Can you share the precise error that you get? There should only be weights and primitives in the state dict

Mar 11 '24 22:03 carmocca

Thanks for the interim sol'n @rasbt , I'll try that out!

@carmocca So this is the stacktrace from the error result of the torch.load operation within the transformers loading logic linked above.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[omitted]/python3.11/site-packages/torch/serialization.py", line 1013, in load
    raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 149

(It's not a "safetensors issue", I was just noting that their control flow falls back to this loading variant if a model.safetensors file can't be found at the provided path.")

Mar 11 '24 22:03 jwkirchenbauer

So we need to find out what's causing the "Unsupported operand 149" to know if it's litgpt saving something that we shouldn't. Would it be possible for you to share this checkpoint? You can omit the tensor data if that's proprietary or private.

Mar 11 '24 23:03 carmocca

@rasbt can you please also guide on the #1095 as well. Essentially, it is similar problem but your approach would not work as I have different config such as n_layer, n_head, n_embd.

Mar 12 '24 16:03 eljanmahammadli

Got the same problem After converting finetuned model(qlora) from litgpt to hf format, when load hf format model will get the error:

Weights only load failed. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 149

But this only happens on transformers version higher than 4.36.0. When I use version 4.34.1, the converted hf format model will be loaded normally.

my finetuned model: Codellama-7b-hf-instruct

If needed, I could share the finetuned checkpoint.

Apr 15 '24 03:04 ch0pp3rVirus

litgpt litgpt copied to clipboard

Error loading converted litgpt checkpoints in `pytorch_model.bin` format using huggingface `AutoModelForCausalLM`

litgpt
litgpt copied to clipboard