transformers
transformers copied to clipboard
Trainer is attempting to log a torch.Tensor but MLflow's only accepts float.
System Info
When I finetune a LLM model (Mistral-7B) I got a very explicit error when the the trainer log into MlFlow
Trainer is attempting to log a value of "0.528405487537384" of type <class 'torch.Tensor'> for key "grad_norm" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
I do not know how to say to the MlFlow callback to convert torch.Tensor to float before logging.
Who can help?
@sanchit-gandhi @muellerz @pacman100
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Unfortunately, I cannot share neither the model or the dataset since they do not belong to me.
Expected behavior
Convert torch.Tensor to float before logging.
Adding
elif isinstance(v, torch.Tensor) and v.numel() == 1:
metrics[k] = float(v)
to the on_log fonction of the MlFlowCallback code solve it.
Hi @etiennebonnafoux, thanks for reporting!
Integrations, like MLFlow aren't actively maintained by us - rather the contributors who added them. We do want them to work, however! Would you like to open a PR with this fix? This way you get the github contribution for your suggestion
This issue is closed with https://github.com/huggingface/transformers/pull/29932