dvclive
dvclive copied to clipboard
`log_model`
Use MLEM to save models. Should also enable these to be automatically tracked with DVC. This is important to be able to save models in DVCLive across frameworks in a consistent way, and to enable experimentation from within Python/DVCLive without separate steps outside of DVCLive. Should be prioritized after we have a basic DVCLive-only workflow for parameters/metrics/plots.
Should this also create a .dvc
file to track the model output? In a DVC pipeline, it would be necessary to either ignore this step or find some alternate integration to make it easy to track it as a stage output.
Also, this should be integrated into callbacks so that it's easy to save models automatically. Related: #300.
a related issue
- https://github.com/iterative/mlem/issues/2
Unless we expand the scope, I believe we don't really need a log_model
method but just add mlem.api.save
calls in the integrations
a related issue
* [`dvclive` integration? mlem#2](https://github.com/iterative/mlem/issues/2)
Good point, let's discuss there since you both already covered most of what's here and more.
Edit: Although so far, all the work seems like it belongs on the DVCLive side, right?
all the work seems like it belongs on the DVCLive side, right?
If we're going to call mlem.api.save
under the hood in the DVCLive, then I assume yes. @daavoo, do you have a vision how the integration should look like for the user?
If we're going to call
mlem.api.save
under the hood in the DVCLive, then I assume yes. @daavoo, do you have a vision how the integration should look like for the user?
I was thinking something as simple as if mlem is not None:
mlem.api.save` if not installed, use "native" ML framework model saving as we currently do
@daavoo, the code you mention should be executed inside DvcLiveCallback()
, am I correct?
@daavoo, the code you mention should be executed inside DvcLiveCallback(), am I correct?
Yes we would need to add it to each integration callback
MLEM is not that heavy, and we may not have an obvious "native" way to save in each framework.
What do you think about making it a dependency and always saving models with MLEM, or making it an optional dependency and skip saving the model if it's not installed? I think it would be similar to mlflow, where auto logging saves an mlflow model as an artifact.
always saving models with MLEM
This assumes MLEM is saving it in the same native format the framework would use. I just thought it was preferable because:
- It provides an entrypoint for MLEM and saves its users a step.
- It means DVCLive doesn't have to maintain the model saving logic or implement this individually into each callback.
This assumes MLEM is saving it in the same native format the framework would use
It is, this is the reason for the integrations with ML frameworks we have. Still, sometimes there are multiple ways to save and load a model with a single framework, and MLEM is opinionated in this situations.
A useful feature of having a dedicated log_model
method would be to dvc add
whatever gets saved.
Closing in favor of #472