GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

How to log F1, precision, recall and other custom metrics while training apart from training and validation loss

Open vijayendra-g opened this issue 9 months ago • 3 comments

Hi

Below is my code to log training and validation loss. I want to log F1, precision, recall and other custom metrics as well. How to do this?

I see https://github.com/urchade/GLiNER/blob/main/gliner/evaluation/evaluator.py which has these functions, has anyone figured out how to use this to log metrics at training time ?

# Calculate number of epochs
num_steps = 10
batch_size = 8
data_size = len(train_dataset)
num_batches = data_size // batch_size
num_epochs = max(10, num_steps // num_batches)
print(num_epochs)

training_args = TrainingArguments(
    output_dir="models",
    learning_rate=5e-6,
    weight_decay=0.01,
    others_lr=1e-5,
    others_weight_decay=0.01,
........
)

class CustomCallback(TrainerCallback):

    def __init__(self):
        self.data = {}

    def on_log(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, logs: dict, **kwargs):
        print(logs)
        step = state.global_step
        if step not in self.data:
            self.data[step] = {'Step': step}

        if 'loss' in logs:
            self.data[step]['Training Loss'] = logs['loss']

        if 'eval_loss' in logs:
            self.data[step]['Validation Loss'] = logs['eval_loss']

    def get_dataframe(self):
        return pd.DataFrame(list(self.data.values()))

custom_callback = CustomCallback()

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=model.data_processor.transformer_tokenizer,
    data_collator=data_collator,
    callbacks=[custom_callback],
)

trainer.train()

vijayendra-g avatar Mar 26 '25 08:03 vijayendra-g

Hi, did you find out how to do it? I have the same issue right now. I would be so happy as I am currently writing my master thesis and fine-tuning Gliner is a part of it

ChristinaPetschnig avatar Apr 18 '25 18:04 ChristinaPetschnig

Hi, I have been trying to evaluate the model during training too. I used the compute__metrics parameter in the Trainer to pass a custom function that receives an EvalPred object and calculates the metrics based on it. Unfortunately, the metrics seem to be wrong though (maybe I am transforming the attributes of EvalPred in a wrong way) as I get a much lower accuracy during training compared to the results on the test set.

Maybe the hint with the compute_metrics helps. If you figure out how to do this I would be very interested in the solution. :)

iazk0 avatar May 13 '25 09:05 iazk0

I did it like this:

from somewhere import val_ds

_PRF_RE = re.compile(r"P:\s*([\d.,]+)%.*?R:\s*([\d.,]+)%.*?F1:\s*([\d.,]+)%", re.S)

def compute_metrics(_eval_pred):
    output, f1 = model.evaluate(
        val_ds.data,
        threshold=0.5,
        flat_ner=True,
        batch_size=1,
    )

    m = _PRF_RE.search(output)
    if m:
        p, r, f = (float(x.replace(",", ".")) / 100 for x in m.groups())
        return {"precision": p, "recall": r, "f1": f}

    # Fallback – at least log F1
    return {"f1": f1}

And then create your Trainer like:

Trainer(
        model=model,
        args=args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        processing_class=tokenizer,
        data_collator=collator,
        compute_metrics=compute_metrics,
    )

If you only want to log it you could use a Callback, but since they are read-only and are run after the Trainer metrics have been saved, they would never show up in WandB or Tensorboard.

The regex at the top is to ensure the string that model.evaluate returns is parsed into a dictionary again. Alternatively, you could directly use the compute_prf function (which returns a dict) from gliner.evaluation.evaluator, but this requires some extra preprocessing which I couldn't be bothered to do.

TimKoornstra avatar Jul 02 '25 12:07 TimKoornstra