dvclive icon indicating copy to clipboard operation
dvclive copied to clipboard

`log_model`

Open dberenbaum opened this issue 2 years ago • 11 comments

Use MLEM to save models. Should also enable these to be automatically tracked with DVC. This is important to be able to save models in DVCLive across frameworks in a consistent way, and to enable experimentation from within Python/DVCLive without separate steps outside of DVCLive. Should be prioritized after we have a basic DVCLive-only workflow for parameters/metrics/plots.

dberenbaum avatar Sep 26 '22 19:09 dberenbaum

Should this also create a .dvc file to track the model output? In a DVC pipeline, it would be necessary to either ignore this step or find some alternate integration to make it easy to track it as a stage output.

dberenbaum avatar Sep 27 '22 16:09 dberenbaum

Also, this should be integrated into callbacks so that it's easy to save models automatically. Related: #300.

dberenbaum avatar Sep 29 '22 15:09 dberenbaum

a related issue

  • https://github.com/iterative/mlem/issues/2

aguschin avatar Sep 30 '22 10:09 aguschin

Unless we expand the scope, I believe we don't really need a log_model method but just add mlem.api.save calls in the integrations

daavoo avatar Sep 30 '22 11:09 daavoo

a related issue

* [`dvclive` integration? mlem#2](https://github.com/iterative/mlem/issues/2)

Good point, let's discuss there since you both already covered most of what's here and more.

Edit: Although so far, all the work seems like it belongs on the DVCLive side, right?

dberenbaum avatar Sep 30 '22 15:09 dberenbaum

all the work seems like it belongs on the DVCLive side, right?

If we're going to call mlem.api.save under the hood in the DVCLive, then I assume yes. @daavoo, do you have a vision how the integration should look like for the user?

aguschin avatar Oct 10 '22 06:10 aguschin

If we're going to call mlem.api.save under the hood in the DVCLive, then I assume yes. @daavoo, do you have a vision how the integration should look like for the user?

I was thinking something as simple as if mlem is not None: mlem.api.save` if not installed, use "native" ML framework model saving as we currently do

daavoo avatar Oct 10 '22 10:10 daavoo

@daavoo, the code you mention should be executed inside DvcLiveCallback(), am I correct?

aguschin avatar Oct 10 '22 15:10 aguschin

@daavoo, the code you mention should be executed inside DvcLiveCallback(), am I correct?

Yes we would need to add it to each integration callback

daavoo avatar Oct 10 '22 15:10 daavoo

MLEM is not that heavy, and we may not have an obvious "native" way to save in each framework.

What do you think about making it a dependency and always saving models with MLEM, or making it an optional dependency and skip saving the model if it's not installed? I think it would be similar to mlflow, where auto logging saves an mlflow model as an artifact.

dberenbaum avatar Oct 10 '22 15:10 dberenbaum

always saving models with MLEM

This assumes MLEM is saving it in the same native format the framework would use. I just thought it was preferable because:

  1. It provides an entrypoint for MLEM and saves its users a step.
  2. It means DVCLive doesn't have to maintain the model saving logic or implement this individually into each callback.

dberenbaum avatar Oct 21 '22 19:10 dberenbaum

This assumes MLEM is saving it in the same native format the framework would use

It is, this is the reason for the integrations with ML frameworks we have. Still, sometimes there are multiple ways to save and load a model with a single framework, and MLEM is opinionated in this situations.

aguschin avatar Oct 24 '22 05:10 aguschin

A useful feature of having a dedicated log_model method would be to dvc add whatever gets saved.

dberenbaum avatar Nov 14 '22 21:11 dberenbaum

Closing in favor of #472

dberenbaum avatar Feb 27 '23 15:02 dberenbaum