unsloth
unsloth copied to clipboard
Context length for DPO on a 7b model
I'm working on using DPO based on the DPO Zephyr Unsloth Example.ipynb notebook.
I'm loading the model like so:
max_seq_length = 32768
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "models/DeepSeek-R1-Distill-Qwen-7B", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
Unsloth 2025.1.8: Fast Qwen2 patching. Transformers: 4.48.2.
GPU: NVIDIA RTX A6000. Max memory: 47.529 GB. Platform: Linux.
Torch: 2.4.0+cu121. CUDA: 8.6. CUDA Toolkit: 12.1. Triton: 3.0.0
Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = True]
Lora config:
model = FastLanguageModel.get_peft_model(
model,
r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0, # Currently only supports dropout = 0
bias = "none", # Currently only supports bias = "none"
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
And trainer:
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = DPOConfig(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
learning_rate = 5e-6,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.0,
lr_scheduler_type = "linear",
seed = 42,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
beta = 0.1,
train_dataset = trl_dataset,
# eval_dataset = raw_datasets["test"],
tokenizer = tokenizer,
# max_length = 11000,
max_length = 16000,
max_prompt_length = 512,
)
With max_length = 11000 everything works fine, but trying to use 16k results in OOM. I saw in the docs that unsloth supports very long context lengths for 7b models, do I need to set anything else besides what I'm doing? Or is this due to DPO being more resource intensive?
Also, is there a way to run this on 2 GPUs to speed up training? (nvlink enabled if it matters)
Any feedback is appreciated. Thanks!
try downgrading trl to 0.13.0 after installing unsloth
!pip uninstall -y trl
!pip install trl==0.13.0
trl 0.14.0 was released last week which caused some issues for me with unsloth DPO trainer, on prior versions i could finetune 8b model at 24k context length on colab A100 40gb but got OOM after latest trl release
try downgrading trl to 0.13.0 after installing unsloth
Thanks, I tried it but I still get OOM even after downgrading trl.
Sorry on the delay - sadly DPO does in fact use more memory than normal finetuning - I'm working on reducing VRAM usage which should definitely help