unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Use With AutoModelForCausalLMWithValueHead

Open lapp0 opened this issue 1 year ago • 1 comments
trafficstars

Thanks for your great work developing Unsloth, it's a critical tool reducing the cost of fine-tuning to less than half!

My question is, within trl, reinforcement learning requires that the model have a value head (AutoModelForCausalLMWithValueHead) for value function estimation.

Are you open to official support for this type of model?

https://github.com/huggingface/trl/blob/main/trl/models/modeling_value_head.py#L61

lapp0 avatar Mar 05 '24 20:03 lapp0

Thanks for the kind words! Hm in theory you can replace the Causal head with other heads, and it should still work - I suggest doing this after all optims are applied

danielhanchen avatar Mar 06 '24 02:03 danielhanchen

For the code segment below I get the traceback which follows it

def get_unsloth_model(base_model_name)
    model, _ = FastLanguageModel.from_pretrained(
        model_name=base_model_name,
        max_seq_length=2048,
        load_in_4bit=True,
    )
    return FastLanguageModel.get_peft_model(
        model,
        target_modules=[
            "q_proj", "v_proj", "k_proj", "o_proj",  # attention (self_attn)
            "gate_proj", "down_proj", "up_proj",  # FFN (mlp)
        ],
        r=16,
        lora_alpha=64,
        lora_dropout = 0,
        bias = "none",
        use_gradient_checkpointing = True,
    )

model = AutoModelForCausalLMWithValueHead.from_pretrained(get_unsloth_model())

...

ppo_trainer.step(...)
  File "/root/ppot.py", line 205, in train
    stats = ppo_trainer.step(
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_trainer.py", line 795, in step
    train_stats = self.train_minibatch(
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_trainer.py", line 1068, in train_minibatch
    self.accelerator.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1966, in backward
    loss.backward(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 289, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 142, in decorate_bwd
    return bwd(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/unsloth/kernels/fast_lora.py", line 131, in backward
    d_downA = h.t() @ (dY @ downB.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != float

lapp0 avatar Mar 19 '24 03:03 lapp0

I needed to call trl.trainer.peft_module_casting_to_bf16(model)

lapp0 avatar Mar 19 '24 03:03 lapp0