hf-multitask-trainer icon indicating copy to clipboard operation
hf-multitask-trainer copied to clipboard

Additional metrics are not separated for eval and train

Open Hrovatin opened this issue 6 months ago • 3 comments

The additional metrics are reported together for eval and train it seems.

  • On every step there are 2 values reported and when investigating total loss one seems to correspond to eval and other one to train
  • The total losses are correctly separated for eval and train, but the additional metrics are reported only 1x
Image

Hrovatin avatar Jun 10 '25 14:06 Hrovatin

Could you provide a minimized code to reproduce this problem? When outputting additional metrics, especially in tensorboard, there is a tag such as train or eval to distinguish between different stages, which is guaranteed by huggingface Trainer.

zipzou avatar Jun 10 '25 14:06 zipzou

I am using mlflow reporting and there it is not distinguished if it is not in the name. Also model prints out the following traces and since they do not have train/val in the name the mlflow just merges them:

{'loss': 11.7567, 'grad_norm': 1.6472886800765991, 'learning_rate': 0.0009901234567901234, 'loss1': 2.9245794147253035, 'loss2': 0.14594885925762355, 'epoch': 0.12307692307692308}
{'eval_loss': 2.9209163188934326, 'eval_runtime': 3.3824, 'eval_samples_per_second': 37.843, 'eval_steps_per_second': 4.73, 'loss1': 2.907386928796768, 'loss2': 0.13529164166538976, 'epoch': 0.12307692307692308}

Hrovatin avatar Jun 11 '25 04:06 Hrovatin

Reproduction example:

import random

import numpy as np
import torch
from torch import nn
from torch.utils.data import Dataset
from transformers.hf_argparser import HfArgumentParser
from transformers.training_args import TrainingArguments
import datasets

from hf_mtask_trainer import HfMultiTaskTrainer

# The model class
class TestModel(nn.Module):
    supports_report_metrics: bool = True # IMPORTANT

    def __init__(self, ) -> None:
        super().__init__()
        self.scaler = nn.Parameter(torch.ones(1))

    def forward(self, x):
        test_tensor = x + self.scaler
        test_np = np.array(np.random.randn()).astype(np.float32)
        test_int = random.randint(1, 100)
        test_float = random.random()
        if hasattr(self, 'report_metrics'): # checking if the report method is accessible or not is the robust practice
            self.report_metrics(
                tensor=test_tensor,
                np=test_np,
                integer=test_int,
                fp_num=test_float
            )

        loss = ((
            test_tensor + torch.from_numpy(test_np) + torch.tensor(test_int) +
            torch.tensor(test_float) - 0
        )).mean()

        outputs = (loss, )

        return outputs

# Mock dataset
class MockDataset(Dataset):

    def __len__(self):
        return 1000

    def __getitem__(self, index: int):
        return dict(x=torch.randn(10, dtype=torch.float32))

args = TrainingArguments(
    save_strategy="steps",
    save_total_limit=3,
    save_only_model =True,
    push_to_hub=False,
    report_to='mlflow',   
    eval_strategy="steps",
    eval_steps= 1,
    logging_strategy = "steps" ,
    logging_steps=1,
    disable_tqdm=True,
    max_steps=3,

)
model = TestModel()
ds = datasets.DatasetDict({
    "train":MockDataset(),
    "test":MockDataset()
})

# Use HfMultiTaskTrainer rather than Trainer
trainer = HfMultiTaskTrainer(
    model, args, 
    train_dataset=ds['train'],
    eval_dataset=ds["test"],)

trainer.train()

Out:

{'loss': 83.6881, 'grad_norm': 1.0, 'learning_rate': 5e-05, 'tensor': 1.0800392627716064, 'np': 0.49671414494514465, 'integer': 82.0, 'fp_num': 0.11133106816568039, 'epoch': 0.008}
{'eval_runtime': 0.062, 'eval_samples_per_second': 16124.682, 'eval_steps_per_second': 2015.585, 'tensor': 1.0018425765037537, 'np': -0.058696798134595156, 'integer': 48.608, 'fp_num': 0.5200184654696903, 'epoch': 0.008}
{'loss': 22.2854, 'grad_norm': 1.0, 'learning_rate': 3.3333333333333335e-05, 'tensor': 0.8969713449478149, 'np': -0.9905363321304321, 'integer': 22.0, 'fp_num': 0.3789731189769161, 'epoch': 0.016}
{'eval_runtime': 0.066, 'eval_samples_per_second': 15162.912, 'eval_steps_per_second': 1895.364, 'tensor': 1.0090636477470398, 'np': 0.05505739440768957, 'integer': 49.592, 'fp_num': 0.4544791103267238, 'epoch': 0.016}
{'loss': 93.4921, 'grad_norm': 1.0, 'learning_rate': 1.6666666666666667e-05, 'tensor': 0.9860073328018188, 'np': 2.1221561431884766, 'integer': 90.0, 'fp_num': 0.3839784673927513, 'epoch': 0.024}
{'eval_runtime': 0.066, 'eval_samples_per_second': 15147.141, 'eval_steps_per_second': 1893.393, 'tensor': 1.007038104057312, 'np': 0.05593263887614012, 'integer': 55.648, 'fp_num': 0.49812778686417014, 'epoch': 0.024}
{'train_runtime': 0.4016, 'train_samples_per_second': 59.762, 'train_steps_per_second': 7.47, 'train_loss': 66.48854319254558, 'epoch': 0.024}
TrainOutput(global_step=3, training_loss=66.48854319254558, metrics={'train_runtime': 0.4016, 'train_samples_per_second': 59.762, 'train_steps_per_second': 7.47, 'train_loss': 66.48854319254558, 'epoch': 0.024})

Plots in mlflow:

Image

Hrovatin avatar Jun 11 '25 05:06 Hrovatin