Open-Assistant
Open-Assistant copied to clipboard
Bth5032/78 blackcat trainer
#78
As per discussion with @theblackcat102 I built the rankgen trainer on top of their framework (wandb). The model seems to be training now in fp32. Apparently t5 has some issue with fp16. Blackcat suggested scaling the weights might help as in 1. This could be a good next step if we want to move with this model. Until then, I think this code is worth comitting since it shows how to add new models which are not AutoModelForSequenceClassification
@bth5032 your wandb loss looks good, but I wonder why is the accuracy validation missing?
Not trying to nickpick anyway, but just curious why not assign the default value for tokenizer_name is same as model_name at the argument_parser then we can remove this if else at the trainer here
Overall the trainer looks fine
@bth5032 thank you! could you run pre-commit run --all-files
to make linters happy?
@bth5032 thank you! could you run
pre-commit run --all-files
to make linters happy?
Thanks!
Not trying to nickpick anyway, but just curious why not assign the default value for tokenizer_name is same as model_name at the argument_parser then we can remove this if else at the trainer here
Ah, I normally use hydra/omegaconf for this, didn't realize you had that utility function, I cleaned that logic up.
@bth5032 your wandb loss looks good, but I wonder why is the accuracy validation missing?
I've been trying to figure that out myself. The compute_metrics
function never runs for me... I found this thread and tried setting evaluation_strategy=IntervalStrategy.STEPS
and a few other things. But ultimately I verified the condition they listed here does evaluate to true in prediction_step
. However I still don't ever see compute_metrics
being called...
I'll keep looking into it and submit a PR if I figure it out.
@theblackcat102 @sanagno should be ready for merge.