litgpt Issues Converting lit_model.pth to Huggingface Format Using convert_from

Hello,

I have been using litgpt to pretrain a model, which produces a lit_model.pth file. This model functions correctly when loaded with LLM.load() for inference.

However, when I attempt to convert this model to the Huggingface format using the convert_from_litgpt script provided by litgpt, it outputs a model.pth file. This file doesn't meet Huggingface's expected format, and when I try to load it using Huggingface's tools, I receive the following error:

state_dict = torch.load("model.pth")
model = AutoModel.from_pretrained(
    model_id, local_files_only=True, state_dict=state_dict
)

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory

I am unsure how to resolve this issue. Is there a step I'm missing in the conversion process, or is there a compatibility issue with the convert_from_litgpt script?

Additionally, I noticed that the .pth file obtained after training with lit-gpt is twice the size of the original model. Could you please explain why this is happening?

Thank you for your assistance.

Dec 02 '24 07:12 Kidand

Hello @Kidand

Could you verify that you did all the step from this tutorial?

Dec 02 '24 13:12 Andrei-Aksionov

Hello @Kidand

Could you verify that you did all the step from this tutorial?

Yes, I followed the tutorial. I noticed that after AutoModel.from_pretrained() reads the config.json file, it looks for model files like .bin and .safetensors but cannot properly load .pth files or state_dict. Why can't the converted files be in bin or safetensor format?

Dec 03 '24 01:12 Kidand

I am having the same problem, except that I copied the config file.

state_dict = torch.load("out/model.pth")
model = AutoModel.from_pretrained(
    'out', local_files_only=True, state_dict=state_dict
)

the directory out contains model.pth, which is generated using the conversion file, and config.json, which I copied from the litgpt trained model directory.

i also got the error

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory xxx/out.

Dec 03 '24 04:12 2533245542

Is it possible to have a script that just converts to huggingface readable format directly?

Like appending this at the end of the original script.

state_dict = torch.load("out/model.pth")
model = AutoModel.from_pretrained(
    'out', local_files_only=True, state_dict=state_dict
)
tokenizer.save_pretrained(path)

Dec 03 '24 04:12 2533245542

Yes, I guess we can do something like that.

cc @rasbt

Dec 03 '24 09:12 Andrei-Aksionov

any updates on this? I have

python=3.10.15 torch=2.4.1 transformers=4.46.3 for litgpt

>>> from importlib.metadata import version
>>> print(version("litgpt"))
0.5.3

my script for running litgpt pretrain --config config.yaml

model_name: mymodel
out_dir: out/custom-model
data:
  class_path: litgpt.data.TextFiles
  init_args:
    train_data_path: mycorpus

    
tokenizer_dir: my_tokenizer

train:
  max_tokens: 600_000_0
  max_seq_length: 8192
  micro_batch_size: 1
  
model_config:

  ## llama3.2 1b
  block_size: 131072
  padded_vocab_size: 128256
  n_layer: 16
  n_embd: 2048
  n_head: 32
  n_query_groups: 8
  rotary_percentage: 1.0
  parallel_residual: false
  bias: false
  norm_class_name: "RMSNorm"
  mlp_class_name: "LLaMAMLP"
  intermediate_size: 8192
  rope_base: 500000
  rope_adjustments:
    factor: 32.0
    low_freq_factor: 1.0
    high_freq_factor: 4.0
    original_max_seq_len: 8192

  vocab_size: 115418


devices: 2

Dec 04 '24 23:12 2533245542

Same issue, is it simply impossible to load a trained model converted to hf format using AutoModel?

Dec 08 '24 11:12 pe-hy

Issues Converting lit_model.pth to Huggingface Format Using convert_from_litgpt