dvclive
dvclive copied to clipboard
integrations: Add DVC Live integration to Ray Tune
Similar to how Ray provides integration for other loggers within Ray Tune it would be good if DVC Live could have its own integration. Concretely, in its documentation for integration of ML Flow with Ray Tune, Ray gives examples of how it created two specific functions to help people both run hyperpartameter optimization with Tune and at the same time track the experiments with ML Flow. If we want to use DVC's Experiments and Checkpoints with Ray Tune, it would be good to have a similar integration available.
Hi @daavoo @MarkoMFilip has there been any progress with this??
Hi @daavoo @MarkoMFilip has there been any progress with this??
Hi @grizzlybearg , there has not been direct progress but since the issue was opened we have added some features (mainly https://github.com/iterative/dvclive/releases/tag/1.1.0) that should allow implementing something similar to the integrations defined in https://docs.ray.io/en/latest/tune/examples/tune-mlflow.html .
I might try to set up a draft P.R. tomorrow since I have checked the code for the MLflowLoggerCallback and it looks simple enough
Thanks @daavoo
@daavoo I was trying to see if there had been a PR for this, as I'd really love to be able to use DVC live with Ray Tune. Not so keen on the other ML monitoring platforms out there. I couldn't find anything related here. Could it be that I'm looking in the wrong place?
Maybe with some guidelines, I'd love to help out if that idea has not been further implemented.
Hi @bastienboutonnet , are you using Ray Tune alongside an existing ML Framework (i.e. keras, pytorch lightning)?
@daavoo We are currently using huggingface transformer trainers
@daavoo We are currently using huggingface transformer trainers
Thanks! Tried to set up a quick example following https://huggingface.co/blog/ray-tune and passing:
from dvclive.huggingface import DVCLiveCallback
trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
But I think I actually need to look into it in more detail 😓 It appears that there is a bug with Ray trying to deserialize the internal DVC Repo instance used by DVCLive
@bastienboutonnet @grizzlybearg @MarkoMFilip or others watching this issue, do you already use DVC and Ray Tune? Do you use them together at all, and if so, how?
Since Ray will often be running on a distributed cluster, the typical DVCLive workflow of writing metrics and plots to local files and using Git to sync them won't work (even locally, since each trial writes to its own run folder, it violates the assumptions of DVC). A couple options would be to:
- Launch Ray from within DVC and sync back each trial's results. Sync the metrics and plots data to a central store (like cloud storage or DVC Studio), keeping track of the experiment associated with those metrics so they can be synced back to the Git/DVC repo.
- Launch DVC from within Ray inside each remote trial. Each trial clones the repo and pulls data, then runs the trial, commits the result, and pushes back to DVC and Git storage.
Related discussions: #676, #638
cc @aguschin