torchtune Hugging Face from_pretrained() using merged weights KeyError: 'base_model_name_or

test codes in https://pytorch.org/torchtune/stable/tutorials/e2e_flow.html#use-with-hugging-face-from-pretrained


from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

print(transformers.__version__)

#TODO: update it to your chosen epoch
trained_model_path = "models/torchtune/llama3_2_3B/lora_single_device/epoch_1"
# trained_model_path = "/home/cine/Documents/tune/models/Llama-3.2-3B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=trained_model_path,
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(trained_model_path, safetensors=True)


# Function to generate text
def generate_text(model, tokenizer, prompt, max_length=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

prompt = "tell me a joke"
print("Base model output:", generate_text(model, tokenizer, prompt))

prompt = "Complete the sentence: 'Once upon a time..."
print("Base model output:", generate_text(model, tokenizer, prompt))

error

(base) cine@20211029-a04:~/Documents/tune$ /home/cine/miniconda3/envs/tune/bin/python /home/cine/Documents/tune/gen_from_merged_sft.py
Traceback (most recent call last):
  File "/home/cine/Documents/tune/gen_from_merged_sft.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/cine/miniconda3/envs/tune/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 514, in from_pretrained
    pretrained_model_name_or_path = adapter_config["base_model_name_or_path"]
KeyError: 'base_model_name_or_path'

but I can use peft to load the sftr model with

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

#TODO: update it to your chosen epoch
trained_model_path = "models/torchtune/llama3_2_3B/lora_single_device/epoch_1"

# Define the model and adapter paths
# # To Avoid this error, we can use local model
original_model_name = '/home/cine/Documents/tune/models/Llama-3.2-3B-Instruct'
model = AutoModelForCausalLM.from_pretrained(original_model_name)

# huggingface will look for adapter_model.safetensors and adapter_config.json
peft_model = PeftModel.from_pretrained(model, trained_model_path)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(original_model_name)

# Function to generate text
def generate_text(model, tokenizer, prompt, max_length=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

prompt = "tell me a joke: '"
print("Base model output:", generate_text(peft_model, tokenizer, prompt))

Jan 02 '25 02:01 chg0901

huggingface may be prioritizing reading from "adapter_config.json", instead of reading the model config. Maybe when i tested it, I tried it with full finetuning, instead of lora.

One sanity check is to remove or move adapter_model.safetensors and adapter_config.json files, to see if it defaults to the full model. I am on PTO this week, but i can look into it next week.

Jan 06 '25 23:01 felipemello1

@chg0901 I'm not able to reproduce the error. For me it seems to work just fine. I might be missing something; can you please help me reproduce it?

Jan 13 '25 01:01 Ankur-singh

ok, but how? I will try my best to assist you if you could specify what should I do

Ankur Singh @.***> 于 2025年1月13日周一 10:14写道：

@chg0901 https://github.com/chg0901 I'm not able to reproduce the error. For me it seems to work just fine. I might be missing something; can you please help me reproduce it?

— Reply to this email directly, view it on GitHub https://github.com/pytorch/torchtune/issues/2224#issuecomment-2586016794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB636WF27O7YNNWYBUSBZR32KMHRBAVCNFSM6AAAAABUPGQJ3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBWGAYTMNZZGQ . You are receiving this because you were mentioned.Message ID: @.***>

Jan 13 '25 01:01 chg0901

Will it be possible to share a colab notebook with all the code to reproduce the error?

Jan 13 '25 13:01 Ankur-singh

https://github.com/chg0901/hands_on_torchtune

Please check this repo

The blog is written in Chinese, but I think maybe you could use translator to read it.

Have a good day

Ankur Singh @.***> 于 2025年1月13日周一 22:58写道：

Will it be possible to share a colab notebook with all the code to reproduce the error?

— Reply to this email directly, view it on GitHub https://github.com/pytorch/torchtune/issues/2224#issuecomment-2587177078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB636WGEGLNH757CO33XWAL2KPBATAVCNFSM6AAAAABUPGQJ3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBXGE3TOMBXHA . You are receiving this because you were mentioned.Message ID: @.***>

Jan 13 '25 14:01 chg0901

I had the same issue, and @felipemello1's answer worked for me perfectly.

Jan 31 '25 04:01 zhang-liyi

Hugging Face from_pretrained() using merged weights KeyError: 'base_model_name_or_path'

error