pyreft icon indicating copy to clipboard operation
pyreft copied to clipboard

[P1] AttributeError: 'CausalLMOutputWithPast' object has no attribute 'mean'

Open krishnardt opened this issue 11 months ago • 2 comments

Hi,

I am training finetuning "meta-llama/Llama-2-7b-hf" with a dataset. Without eval strategy it is working fine. But with eval strategy I am facing the issue mentioned in the title. Could anyone help me in fixing this issue?

This is the code I am using as reference: https://github.com/stanfordnlp/pyreft/blob/main/examples/alpaca/train.py

Below is the error trace:

image

Below are the Training arguments I am passing:

training_args = TrainArguments( # output_dir=f"./checkpoints/rank{rank}", output_dir="./trained_model_wt_eval", learning_rate=3e-5, num_train_epochs=2, # evaluation_strategy="epoch", eval_strategy="steps", # do_eval=False, lr_scheduler_type="linear", # warmup_steps=warmup_steps, per_device_train_batch_size=2, per_device_eval_batch_size=2, # gradient_accumulation_steps=grad_acc, weight_decay=0.01, logging_dir=f"./logs/reft_rank{rank}", logging_strategy="steps", logging_steps = 2, save_strategy="steps", save_steps=500, save_total_limit=3, load_best_model_at_end=False, # fp16=True, bf16=True, remove_unused_columns=False )

Data arguments for training:

`def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, model, layers, training_args, data_args) -> Dict: train_dataset = ReftSupervisedDataset( "alpaca", data_args.data_path, tokenizer, data_split="train", seed=training_args.seed, max_n_example=training_args.max_n_train_example, input_field="input", instruction_field="instruction", output_field="output", **{"num_interventions": len(layers), "position": training_args.position, "share_weights": training_args.share_weights} )

eval_dataset = ReftSupervisedDataset(
    "alpaca", data_args.eval_path, tokenizer, data_split="test", seed=training_args.seed,
    max_n_example=training_args.max_n_train_example,
    input_field="input", instruction_field="instruction", output_field="output",
    **{"num_interventions": len(layers), "position": training_args.position, 
       "share_weights": training_args.share_weights}
)
print(train_dataset)
print(eval_dataset)
data_collator_fn = transformers.DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model,
    label_pad_token_id=-100,
    padding="longest"
)
data_collator = ReftDataCollator(data_collator=data_collator_fn)
return dict(train_dataset=train_dataset, eval_dataset=eval_dataset, data_collator=data_collator)

data_module = make_supervised_data_module( tokenizer=tokenizer, model=model, layers=layers, training_args=training_args, data_args=data_args)

`

krishnardt avatar Dec 28 '24 04:12 krishnardt

@frankaging or anyone, help me resolving this issue?

krishnardt avatar Dec 28 '24 11:12 krishnardt

I was able to resolve this locally (for ReftTrainerForCausalLM) by changing the output of the compute_loss function:

from pyreft import ReftTrainerForCausalLM


def patched_compute_loss(self, intervenable, inputs, return_outputs=False, **kwargs):
    # run intervened forward pass
    unit_locations = None
    if "intervention_locations" in inputs:
        if inputs["intervention_locations"].dim() == 3:
            unit_locations={"sources->base": (
                None,
                inputs["intervention_locations"].permute(1, 0, 2).tolist()
            )}
        else:
            # this is dummy for lora only baseline
            unit_locations={"sources->base": (None, 0)}
    base_outputs, cf_outputs = intervenable(
        {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        },
        unit_locations=unit_locations,
        labels=inputs["labels"],
        subspaces=inputs["subspaces"].permute(1, 0, 2).tolist() if "subspaces" in inputs else None
    )
    # return
    output = cf_outputs
    if cf_outputs is None:
        output = base_outputs # in case of lora only training
    
    loss = output.loss
    return (loss, output) if return_outputs else output.loss

  
ReftTrainerForCausalLM.compute_loss = patched_compute_loss

Specifically, returning (loss, output) rather than (output, output). I'm not sure why the original returns (output, output) but maybe this can help for now?

BEW111 avatar Feb 17 '25 19:02 BEW111