zamba
zamba copied to clipboard
Write out training and validation metrics to storage when fine-tuning
Right now fine tuning metrics go to mlflow, but if we're fine tuning on the k8s cluster, they go nowhere after the job completes. They should at least get spit out to stdout so they end up in the log in the blob storage. This will allow us to make sure fine tuning is working correctly and will help debug any user issues that come up.