mlem icon indicating copy to clipboard operation
mlem copied to clipboard

`dvclive` integration?

Open daavoo opened this issue 4 years ago • 3 comments

Sorry if I'm missing some context regarding the scope of mlem; I have reading a bit of the existing documentation in Notion and skimmed through the code in this repository. I'm commenting this here instead of the dvclive repo because afaik mlem is not currently "public".

So, in dvclive there is an open discussion on how to (or even whether to) add support for saving models:

https://github.com/iterative/dvclive/issues/105

I was just thinking about what other ML Loggers offer in that regard and came across (I was actually an user of this feature in my previous company) MLFlow Models. I think that having an "unified" model metadata format could be a good point to justify the addition of a dvclive.save_model functionality.

As far as I understand from the components description, mlem follows a similar approach to MLFlow Models so, given that it is in dvclive plans to work and extend it's integrations with ML Frameworks, it seems like this model serialization could be a common point of interest for both projects.

Does this make sense for those working on mlem?

daavoo avatar Jul 12 '21 11:07 daavoo

Hi, @daavoo! Thanks for submitting this. Sorry for the long delay, wasn't subscribed to repo issues. FIrst of all, feel free to tag me and @mike0sv on any question regarding MLEM, right now we are the two working on it.

Regarding the question itself: indeed, one of fundamental functions of MLEM is to save models, whether locally or via committing them to the remote repo (check "Save model" header here https://www.notion.so/iterative/Creating-Model-Registry-1b4383745de349a48bd2c08fffcfda63#9b3137ec520f4902a646635b01795a4d).

Right now MLEM has a powerful mechanism to identify what model is supplied to mlem.save(model, path) to save it appropriately (e.g., is it sklearn model, lightgbm, torch, or something else).

Is saving the only action that dvclive wants to do with ML models?

It's definitely worth discussing this in more details. I'm not sure how to integrate this with dvclive properly and what parts do we need to implement twice in both libs.

aguschin avatar Jul 23 '21 10:07 aguschin

Is saving the only action that dvclive wants to do with ML models?

As of today, yes. But maybe @dberenbaum can give additional insights from https://www.notion.so/iterative/Lightweight-Experiments-cec7b7c6e3d2451490491d22dfb4e63f . We are moving towards taking care of saving the model in all ML Framework integrations. Currently we are using the "native" ML Framework mechanism for saving the model (i.e. keras.model.save).

It's definitely worth discussing this in more details. I'm not sure how to integrate this with dvclive properly and what parts do we need to implement twice in both libs.

For me, it feels like DVCLive main value is to reduce the amount of changes users need to benefit from DVC features, without having to care about the specifics.

It seems like a good direction to discuss if a similar sinergy with MLEM could exist; use DVCLive to make MLEM features more easily accesible from a DVC pipeline.

The way I currently see it is that we could add mlem.save calls as part of each DVCLive integration. I'm not sure if the fact that DVCLive would know the ML Framework beforehand could be used to make assumptions on this mlem.save calls.


As a side note, although it might be too early to discuss, it seems that the work being done in https://github.com/iterative/dvc/pull/6332 could be another potential place for synergy between DVC checkpoints and mlem.publish (although I don't know if MLEM considers an intermetiade checkpoint a model or it's more focused on the "post train" model).

daavoo avatar Jul 30 '21 16:07 daavoo

I'm not sure if the fact that DVCLive would know the ML Framework beforehand could be used to make assumptions on this mlem.save calls.

In general, there should be no need to know ML Framework beforehand because MLEM finds it out under the hood inspecting the obj you pass to mlem.save(obj, path).

Also, we've been busy with finishing prototype and product design tasks for MLEM, so this was on hold for us for some time. We can get back to this in the second half of September, when the model saving functionality in MLEM will be ready for the first closed release. Or if you feel that we need to start discussing this earlier for some reason, please let me know.

aguschin avatar Sep 02 '21 14:09 aguschin