h2o-llmstudio [CODE IMPROVEMENT] Store weights in AutoModelForCausalLM format

🔧 Proposed code refactoring

Currently, model weights are stored in LLm Studio format which is a small wrapper around AutoModelForCausalLM. Instead, store model weights in AutoModelForCausalLM format, as well as tokenizer/model configs in the experiment output directory.

Motivation

Allows to directly use grab the output directory within the Huggingface universe. Related to #10.

Apr 20 '23 09:04 maxjeblick

Agree!

Also related: https://github.com/h2oai/h2o-llmstudio/issues/5

We just need to care how to handle LORA and other adapters on top. So maybe we merge the weights always back and then save that, but it will make chaining experiments a bit more tricky and inference code needs to be adjusted.

Apr 20 '23 09:04 psinger

We just need to care how to handle LORA and other adapters on top.

Yes that may be tricky. We can also stick to the export functionality and convert the weights a posteriori.

Apr 20 '23 09:04 maxjeblick

Maybe that's fine yeah. But we need to make the HF export self-contained.

Apr 20 '23 09:04 psinger

Hi,

@psinger @maxjeblick - this is highly relevant for me such that I am curious when this could be done or in case of a larger time consumption if there is a way to achieve this manually...?

Let's say I am training with Huggingface's Trainer, wouldn't the following work (roughly coded):

trainer.model.save_pretrained(output_dir)

del model
del trainer

peft_config = PeftConfig.from_pretrained(output_dir)
model = AutoModelForCausalLM.from_pretrained(
        peft_config.base_model_name_or_path,
        load_in_8bit=False,
        return_dict=True,
        device_map="auto",
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(
        model,
        output_dir,
        torch_dtype=torch.float16,
        device_map="auto",
)
model.eval()
os.makedirs("lora", exist_ok=True)

merged_model = model.merge_and_unload()
merged_model.save_pretrained('lora')

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.save_pretrained('lora')

This way the model with the merged weights would be stored alongside the tokenizer which would enable users to use the "normal" HF inference mode - wouldn't it?

The merge_and_unload() function however would afaik need a PEFT version >= 0.3.0.

Any Thoughts on that?

Apr 21 '23 05:04 JulianGerhard21

@JulianGerhard21

That's what we are actually already doing when pushing to HF: https://github.com/h2oai/h2o-llmstudio/blob/main/app_utils/sections/experiment.py#L1544

So whenever you push a model to HF via LLM Studio, it will directly work the way you are envisioning it. You can use it as simply as:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hf_path")
model = AutoModelForCausalLM.from_pretrained("hf_path")
model.half().cuda()

# need to match the input prompt how you are doing it in the LLM Studio Prompt
inputs = tokenizer("How are you?<|endoftext|>", return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Just need to make the prompt as you trained it, and set the inference parameters accordingly.

Apr 21 '23 07:04 psinger

Will be tackled otherwise.

Apr 26 '23 11:04 psinger

h2o-llmstudio h2o-llmstudio copied to clipboard

[CODE IMPROVEMENT] Store weights in AutoModelForCausalLM format

🔧 Proposed code refactoring

Motivation

h2o-llmstudio
h2o-llmstudio copied to clipboard