h2o-llmstudio
h2o-llmstudio copied to clipboard
[CODE IMPROVEMENT] Store weights in AutoModelForCausalLM format
🔧 Proposed code refactoring
Currently, model weights are stored in LLm Studio format which is a small wrapper around AutoModelForCausalLM. Instead, store model weights in AutoModelForCausalLM format, as well as tokenizer/model configs in the experiment output directory.
Motivation
Allows to directly use grab the output directory within the Huggingface universe. Related to #10.
Agree!
Also related: https://github.com/h2oai/h2o-llmstudio/issues/5
We just need to care how to handle LORA and other adapters on top. So maybe we merge the weights always back and then save that, but it will make chaining experiments a bit more tricky and inference code needs to be adjusted.
We just need to care how to handle LORA and other adapters on top.
Yes that may be tricky. We can also stick to the export functionality and convert the weights a posteriori.
Maybe that's fine yeah. But we need to make the HF export self-contained.
Hi,
@psinger @maxjeblick - this is highly relevant for me such that I am curious when this could be done or in case of a larger time consumption if there is a way to achieve this manually...?
Let's say I am training with Huggingface's Trainer, wouldn't the following work (roughly coded):
trainer.model.save_pretrained(output_dir)
del model
del trainer
peft_config = PeftConfig.from_pretrained(output_dir)
model = AutoModelForCausalLM.from_pretrained(
peft_config.base_model_name_or_path,
load_in_8bit=False,
return_dict=True,
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(
model,
output_dir,
torch_dtype=torch.float16,
device_map="auto",
)
model.eval()
os.makedirs("lora", exist_ok=True)
merged_model = model.merge_and_unload()
merged_model.save_pretrained('lora')
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.save_pretrained('lora')
This way the model with the merged weights would be stored alongside the tokenizer which would enable users to use the "normal" HF inference mode - wouldn't it?
The merge_and_unload() function however would afaik need a PEFT version >= 0.3.0.
Any Thoughts on that?
@JulianGerhard21
That's what we are actually already doing when pushing to HF: https://github.com/h2oai/h2o-llmstudio/blob/main/app_utils/sections/experiment.py#L1544
So whenever you push a model to HF via LLM Studio, it will directly work the way you are envisioning it. You can use it as simply as:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("hf_path")
model = AutoModelForCausalLM.from_pretrained("hf_path")
model.half().cuda()
# need to match the input prompt how you are doing it in the LLM Studio Prompt
inputs = tokenizer("How are you?<|endoftext|>", return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Just need to make the prompt as you trained it, and set the inference parameters accordingly.
Will be tackled otherwise.