How to log F1, precision, recall and other custom metrics while training apart from training and validation loss
Hi
Below is my code to log training and validation loss. I want to log F1, precision, recall and other custom metrics as well. How to do this?
I see https://github.com/urchade/GLiNER/blob/main/gliner/evaluation/evaluator.py which has these functions, has anyone figured out how to use this to log metrics at training time ?
# Calculate number of epochs
num_steps = 10
batch_size = 8
data_size = len(train_dataset)
num_batches = data_size // batch_size
num_epochs = max(10, num_steps // num_batches)
print(num_epochs)
training_args = TrainingArguments(
output_dir="models",
learning_rate=5e-6,
weight_decay=0.01,
others_lr=1e-5,
others_weight_decay=0.01,
........
)
class CustomCallback(TrainerCallback):
def __init__(self):
self.data = {}
def on_log(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, logs: dict, **kwargs):
print(logs)
step = state.global_step
if step not in self.data:
self.data[step] = {'Step': step}
if 'loss' in logs:
self.data[step]['Training Loss'] = logs['loss']
if 'eval_loss' in logs:
self.data[step]['Validation Loss'] = logs['eval_loss']
def get_dataframe(self):
return pd.DataFrame(list(self.data.values()))
custom_callback = CustomCallback()
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
tokenizer=model.data_processor.transformer_tokenizer,
data_collator=data_collator,
callbacks=[custom_callback],
)
trainer.train()
Hi, did you find out how to do it? I have the same issue right now. I would be so happy as I am currently writing my master thesis and fine-tuning Gliner is a part of it
Hi, I have been trying to evaluate the model during training too. I used the compute__metrics parameter in the Trainer to pass a custom function that receives an EvalPred object and calculates the metrics based on it.
Unfortunately, the metrics seem to be wrong though (maybe I am transforming the attributes of EvalPred in a wrong way) as I get a much lower accuracy during training compared to the results on the test set.
Maybe the hint with the compute_metrics helps. If you figure out how to do this I would be very interested in the solution. :)
I did it like this:
from somewhere import val_ds
_PRF_RE = re.compile(r"P:\s*([\d.,]+)%.*?R:\s*([\d.,]+)%.*?F1:\s*([\d.,]+)%", re.S)
def compute_metrics(_eval_pred):
output, f1 = model.evaluate(
val_ds.data,
threshold=0.5,
flat_ner=True,
batch_size=1,
)
m = _PRF_RE.search(output)
if m:
p, r, f = (float(x.replace(",", ".")) / 100 for x in m.groups())
return {"precision": p, "recall": r, "f1": f}
# Fallback – at least log F1
return {"f1": f1}
And then create your Trainer like:
Trainer(
model=model,
args=args,
train_dataset=train_ds,
eval_dataset=val_ds,
processing_class=tokenizer,
data_collator=collator,
compute_metrics=compute_metrics,
)
If you only want to log it you could use a Callback, but since they are read-only and are run after the Trainer metrics have been saved, they would never show up in WandB or Tensorboard.
The regex at the top is to ensure the string that model.evaluate returns is parsed into a dictionary again. Alternatively, you could directly use the compute_prf function (which returns a dict) from gliner.evaluation.evaluator, but this requires some extra preprocessing which I couldn't be bothered to do.