axolotl
axolotl copied to clipboard
Resuming a checkpoint with LORA
What piece of documentation is affected?
LORA usage
What part(s) of the article would you like to see updated?
When resuming a lora model training, I get the warning that the model is being "double lora'd". I resolve this by commenting out the lora adaptor when resuming which is cumbersome. Is this expected or is there another setting to use?
We should update the documentation to reflect proper usage
Additional Information
No response
Acknowledgements
- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this feature has not been requested yet.
- [x] I have provided enough information for the maintainers to understand and evaluate this request.
Do you have the stack trace?
hey @NanoCode012 , no stack trace but this is the log I get:
[err, 0:56, 60s, g3117] Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:07<00:21, 7.33s/it]
[err, 0:56, 60s, g3117] /gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/peft/mapping_func.py:73: UserWarning: You are trying to modify a model with PEFT for a second time. If you want to reload the model with a different config, make sure to call `.unload()` before.
[err, 0:56, 60s, g3117] warnings.warn(
my config looks like:
base_model: meta-llama/Meta-Llama-3-8B-Instruct
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name
load_in_8bit: false
load_in_4bit: false
strict: false
# torch_compile: true
# torch_compile_backend: inductor
vllm:
host: 0.0.0.0
port: 8002
tensor_parallel_size: 1
gpu_memory_utilization: 0.8
dtype: auto
# max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand
rl: grpo
trl:
use_vllm: true
vllm_server_host: localhost
vllm_server_port: 8002
vllm_server_timeout: 300
beta: 0.005
epsilon: 0.2
epsilon_high: 0.28
max_completion_length: 512
use_vllm: true
reward_funcs:
- saparov_graph.correctness_reward_func
- saparov_graph.int_reward_func
- saparov_graph.strict_format_reward_func
- saparov_graph.soft_format_reward_func
- saparov_graph.xmlcount_reward_func
vllm_gpu_memory_utilization: 0.8
num_generations: 16
# deepspeed: /gscratch/clmbr/revr/LRMGraph/scripts/axolotl/experiments/initial_rl/__deepspeed_configs/zero1.json
chat_template: llama3
datasets:
- path: skrishna/gsm8k_only_answer
type: saparov_graph.axo_gsm8k_transform
dataset_prepared_path: /gscratch/clmbr/revr/LRMGraph/workspace/data/last_run_prepared
skip_prepare_dataset: true
val_set_size: 0.0
output_dir: /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/initial-rl-8b-single-gpu-3
dataloader_prefetch_factor: 32
dataloader_num_workers: 2
dataloader_pin_memory: true
gc_steps: 1
sequence_len: 800
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
# lora_modules_to_save:
# - embed_tokens
# - lm_head
wandb_project: gsm8k-grpo-proj
wandb_entity:
wandb_name: rev2021-university-of-washington
gradient_accumulation_steps: 8
micro_batch_size: 16 # should match num_generations / num_gpus
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 1.0e-5
max_grad_norm: 0.1
weight_decay: 0.01
bf16: true
fp16: false
tf32: true
adapter: lora
lora_r: 64 # whatever rank you chose
lora_alpha: 64
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
gradient_checkpointing: true
# gradient_checkpointing_kwargs:
# use_reentrant: true
flash_attention: true
logging_steps: 1
warmup_steps: 0
warmup_ratio: .03
evals_per_epoch: 1
# saves_per_epoch: 4
save_steps: 100
# auto_resume_from_checkpoints: true
auto_resume_from_checkpoints: true
special_tokens:
pad_token: <|end_of_text|>