dvclive icon indicating copy to clipboard operation
dvclive copied to clipboard

integrations: Add DVC Live integration to Ray Tune

Open MarkoMFilip opened this issue 2 years ago • 8 comments

Similar to how Ray provides integration for other loggers within Ray Tune it would be good if DVC Live could have its own integration. Concretely, in its documentation for integration of ML Flow with Ray Tune, Ray gives examples of how it created two specific functions to help people both run hyperpartameter optimization with Tune and at the same time track the experiments with ML Flow. If we want to use DVC's Experiments and Checkpoints with Ray Tune, it would be good to have a similar integration available.

MarkoMFilip avatar Apr 06 '22 09:04 MarkoMFilip

Hi @daavoo @MarkoMFilip has there been any progress with this??

grizzlybearg avatar Feb 09 '23 16:02 grizzlybearg

Hi @daavoo @MarkoMFilip has there been any progress with this??

Hi @grizzlybearg , there has not been direct progress but since the issue was opened we have added some features (mainly https://github.com/iterative/dvclive/releases/tag/1.1.0) that should allow implementing something similar to the integrations defined in https://docs.ray.io/en/latest/tune/examples/tune-mlflow.html .

I might try to set up a draft P.R. tomorrow since I have checked the code for the MLflowLoggerCallback and it looks simple enough

daavoo avatar Feb 09 '23 17:02 daavoo

Thanks @daavoo

grizzlybearg avatar Feb 15 '23 19:02 grizzlybearg

@daavoo I was trying to see if there had been a PR for this, as I'd really love to be able to use DVC live with Ray Tune. Not so keen on the other ML monitoring platforms out there. I couldn't find anything related here. Could it be that I'm looking in the wrong place?

Maybe with some guidelines, I'd love to help out if that idea has not been further implemented.

bastienboutonnet avatar Jun 30 '23 12:06 bastienboutonnet

Hi @bastienboutonnet , are you using Ray Tune alongside an existing ML Framework (i.e. keras, pytorch lightning)?

daavoo avatar Jun 30 '23 13:06 daavoo

@daavoo We are currently using huggingface transformer trainers

bastienboutonnet avatar Jun 30 '23 18:06 bastienboutonnet

@daavoo We are currently using huggingface transformer trainers

Thanks! Tried to set up a quick example following https://huggingface.co/blog/ray-tune and passing:

from dvclive.huggingface import DVCLiveCallback

trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))

But I think I actually need to look into it in more detail 😓 It appears that there is a bug with Ray trying to deserialize the internal DVC Repo instance used by DVCLive

daavoo avatar Jun 30 '23 19:06 daavoo

@bastienboutonnet @grizzlybearg @MarkoMFilip or others watching this issue, do you already use DVC and Ray Tune? Do you use them together at all, and if so, how?

Since Ray will often be running on a distributed cluster, the typical DVCLive workflow of writing metrics and plots to local files and using Git to sync them won't work (even locally, since each trial writes to its own run folder, it violates the assumptions of DVC). A couple options would be to:

  1. Launch Ray from within DVC and sync back each trial's results. Sync the metrics and plots data to a central store (like cloud storage or DVC Studio), keeping track of the experiment associated with those metrics so they can be synced back to the Git/DVC repo.
  2. Launch DVC from within Ray inside each remote trial. Each trial clones the repo and pulls data, then runs the trial, commits the result, and pushes back to DVC and Git storage.

Related discussions: #676, #638

cc @aguschin

dberenbaum avatar Aug 24 '23 17:08 dberenbaum