torchtune How to use finetuned lora adapter in a huggingface-like pipeline

Hi, thanks for this amazing project. I was trying to finetune the lora model for Llama3.2 Vision which works fine and saved a adapter_0.pt; Then I wanted to use this adapter checkpoint for inference in a huggingface pipeline where i see some issues. I tried two ways as following and thanks in advance!

Weight format conversion: there is a file to convert meta (meta_model_0.pt) to tune, then to HF as in this _convert_weights.py. I tried to use it to convert a full model (non-lora), and it works for inference, but it does not work the adapter checkpoint as the parameters names are different.
Then I tried to set the _component_ parameter to torchtune.training.FullModelHFCheckpointer hoping to get a huggingface compatible model directly. Then I got the error: *** KeyError: 'num_attention_heads'.

  File ".../lora_finetune_distributed.py", line 725, in save_checkpoint
    self._checkpointer.save_checkpoint(
  File ".../lib/python3.10/site-packages/torchtune/training/checkpointing/_checkpointer.py", line 639, in save_checkpoint
    num_heads=self._config["num_attention_heads"],
KeyError: 'num_attention_heads'

What would be ideal is to have a way to use the finetuned model in a way similar to the following? Note the following code does not work as there is no adapter_config.json under the checkpointer/output_dir path.

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

# Replace the original model loading code with this:
# with torch.inference_mode():
model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    low_cpu_mem_usage=True,
)

from peft import PeftModel
peft_model = PeftModel.from_pretrained(model, "checkpointer/output_dir")

To reproduce what I observed:

I am using the following command to run, with slightly modified config file:

tune run --nnodes 1 --nproc_per_node 1 lora_finetune_distributed --config ./finetune-llama32/11B _lora_debug.yaml

Main modification:

from

_component_: torchtune.training.FullModelMetaCheckpointer
originally: checkpoint_files: [consolidated.pth]

to

  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/
  checkpoint_files: [model-00001-of-00005.safetensors,
                      model-00002-of-00005.safetensors,
                      model-00003-of-00005.safetensors,
                      model-00004-of-00005.safetensors,
                      model-00005-of-00005.safetensors ]

Full config file:

# Config for multi-device LoRA finetuning in lora_finetune_distributed.py
# using a Llama3.2 11B Vision Instruct model
#
# This config assumes that you've run the following command before launching:
#   tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct
#
# To launch on 2 devices, run the following command from root:
#   tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_2_vision/11B_lora
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training:
#   tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_2_vision/11B_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# For single device LoRA finetuning please use 11B_lora_single_device.yaml
# or 11B_qlora_single_device.yaml

# Model arguments
model:
  _component_: torchtune.models.llama3_2_vision.lora_llama3_2_vision_11b
  decoder_trainable: "frozen"
  encoder_trainable: "lora"
  fusion_trainable: "lora"
  lora_attn_modules: ['q_proj', 'v_proj']
  apply_lora_to_mlp: False
  apply_lora_to_output: False
  lora_rank: 8
  lora_alpha: 16
  lora_dropout: 0.0
  image_size: 560 # Make sure this matches the image_size in tokenizer

# Transform
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  path: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model
  image_size: 560

# Checkpointer
checkpointer:
  # _component_: torchtune.training.FullModelMetaCheckpointer
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/
  checkpoint_files: [model-00001-of-00005.safetensors,
                      model-00002-of-00005.safetensors,
                      model-00003-of-00005.safetensors,
                      model-00004-of-00005.safetensors,
                      model-00005-of-00005.safetensors ]
  # originally: checkpoint_files: [consolidated.pth]
  recipe_checkpoint: null
  output_dir: /tmp/Llama-3.2-11B-Vision-Instruct/
  model_type: LLAMA3_VISION
resume_from_checkpoint: False

# Dataset
dataset:
  _component_: torchtune.datasets.multimodal.the_cauldron_dataset
  subset: ocrvqa
seed: null
shuffle: True
collate_fn: torchtune.data.padded_collate_tiled_images_and_mask

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 4
optimizer:
  _component_: torch.optim.AdamW
  fused: True
  weight_decay: 0.01
  lr: 2e-5
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
clip_grad_norm: 1.0
compile: False # set it to True for better memory and performance

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
enable_activation_offloading: False
dtype: bf16

# Logging
output_dir: /tmp/full-llama3.2-vision-finetune
log_peak_memory_stats: False
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  project: llama3.2_lora_project
log_every_n_steps: 1

Oct 09 '24 11:10 ryf1123

Hi @ryf1123 thanks for creating the issue. I think (2) is the way we intend for this to be done, unfortunately looks like it doesn't currently work! So glad you pointed this out. I think the problem is that for the Llama 3.2 Vision model the config is structured a bit differently.. so the num_attention_heads field is still there, but it's just under the text_config field.

The ideal case you described of loading in with PeftModel.from_pretrained is exactly what we want to have and it should currently work for our text models (see the test plan in #933). But I think we need to make some changes to save the adapter weights properly for the multimodal model. I am gonna take a closer look at this, and will also assign to @pbontrager for further investigation since he's quite familiar with the multimodal key mappings

Oct 10 '24 02:10 ebsmothers

hey @ryf1123, thanks for raising this issue! We put up this PR, updating the configs to use HF ckpt. It still saves the adapter as torchtune, and not PeFT format. We need to work on it, since its not text-only. We have to see how the vision part works.

Regarding your error "KeyError: 'num_attention_heads'", it happens here: https://github.com/pytorch/torchtune/blob/33b8143d9d2b01cd5b6fe97091f2987f081146b6/torchtune/training/checkpointing/_checkpointer.py#L639

The correct way should be: num_heads=self._config["text_config"]["num_attention_heads"]

Check what we do here: https://github.com/pytorch/torchtune/blob/33b8143d9d2b01cd5b6fe97091f2987f081146b6/torchtune/training/checkpointing/_checkpointer.py#L478

If you wanna try to give it a stab, and see if you can make it PeFT compatible, we would love the PR :)

Oct 26 '24 03:10 felipemello1

update: @pbontrager is working on this :)

Oct 28 '24 14:10 felipemello1