axolotl Resuming a checkpoint with LORA

What piece of documentation is affected?

LORA usage

What part(s) of the article would you like to see updated?

When resuming a lora model training, I get the warning that the model is being "double lora'd". I resolve this by commenting out the lora adaptor when resuming which is cumbersome. Is this expected or is there another setting to use?

We should update the documentation to reflect proper usage

Additional Information

No response

Acknowledgements

[x] My issue title is concise, descriptive, and in title casing.
[x] I have searched the existing issues to make sure this feature has not been requested yet.
[x] I have provided enough information for the maintainers to understand and evaluate this request.

Apr 25 '25 16:04 RevanthRameshkumar

Do you have the stack trace?

Apr 28 '25 14:04 NanoCode012

hey @NanoCode012 , no stack trace but this is the log I get:

[err, 0:56, 60s, g3117] Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:07<00:21,  7.33s/it]
[err, 0:56, 60s, g3117] /gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/peft/mapping_func.py:73: UserWarning: You are trying to modify a model with PEFT for a second time. If you want to reload the model with a different config, make sure to call `.unload()` before.
[err, 0:56, 60s, g3117]   warnings.warn(

my config looks like:

base_model: meta-llama/Meta-Llama-3-8B-Instruct
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

load_in_8bit: false
load_in_4bit: false
strict: false

# torch_compile: true
# torch_compile_backend: inductor

vllm:
    host: 0.0.0.0
    port: 8002
    tensor_parallel_size: 1
    gpu_memory_utilization: 0.8
    dtype: auto
    # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand

rl: grpo
trl:
  use_vllm: true
  vllm_server_host: localhost
  vllm_server_port: 8002
  vllm_server_timeout: 300
  beta: 0.005
  epsilon: 0.2
  epsilon_high: 0.28
  max_completion_length: 512
  use_vllm: true
  reward_funcs:
    - saparov_graph.correctness_reward_func
    - saparov_graph.int_reward_func
    - saparov_graph.strict_format_reward_func
    - saparov_graph.soft_format_reward_func
    - saparov_graph.xmlcount_reward_func
  vllm_gpu_memory_utilization: 0.8
  num_generations: 16

# deepspeed: /gscratch/clmbr/revr/LRMGraph/scripts/axolotl/experiments/initial_rl/__deepspeed_configs/zero1.json

chat_template: llama3
datasets:
  - path: skrishna/gsm8k_only_answer
    type: saparov_graph.axo_gsm8k_transform
dataset_prepared_path: /gscratch/clmbr/revr/LRMGraph/workspace/data/last_run_prepared
skip_prepare_dataset: true
val_set_size: 0.0
output_dir: /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/initial-rl-8b-single-gpu-3

dataloader_prefetch_factor: 32
dataloader_num_workers: 2
dataloader_pin_memory: true

gc_steps: 1

sequence_len: 800
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

# lora_modules_to_save:
#  - embed_tokens
#  - lm_head

wandb_project: gsm8k-grpo-proj
wandb_entity:
wandb_name: rev2021-university-of-washington

gradient_accumulation_steps: 8
micro_batch_size: 16  # should match num_generations / num_gpus
num_epochs: 1

optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 1.0e-5
max_grad_norm: 0.1
weight_decay: 0.01

bf16: true
fp16: false
tf32: true

adapter: lora
lora_r: 64         # whatever rank you chose
lora_alpha: 64
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

gradient_checkpointing: true
# gradient_checkpointing_kwargs:
#   use_reentrant: true
flash_attention: true

logging_steps: 1
warmup_steps: 0
warmup_ratio: .03
evals_per_epoch: 1
# saves_per_epoch: 4
save_steps: 100

# auto_resume_from_checkpoints: true
auto_resume_from_checkpoints: true

special_tokens:
   pad_token: <|end_of_text|>

May 03 '25 22:05 RevanthRameshkumar