Write out training and validation metrics to storage when fine-tuning

Open jdcc opened this issue 9 months ago • 0 comments

Right now fine tuning metrics go to mlflow, but if we're fine tuning on the k8s cluster, they go nowhere after the job completes. They should at least get spit out to stdout so they end up in the log in the blob storage. This will allow us to make sure fine tuning is working correctly and will help debug any user issues that come up.

Mar 18 '25 19:03 jdcc