aim icon indicating copy to clipboard operation
aim copied to clipboard

Model and dataset versioning in Aim and render saved artefacts on UI

Open ashutoshsaboo opened this issue 2 years ago • 3 comments

🚀 Feature

Add support to track & version various models/datasets across runs/experiments. Aim does an excellent job at tracking, but there's no way currently for end to end model/data artefact management currently in Aim. The objective is to enhance reproducibility, and with just a couple of lines of code Aim (just like mlflow for inspiration) should ideally do the heavy-lifting of serializing and saving models as well as associating it to the dataset version.

Looks like DVC does most of this already and has excellent integration with Git, and potentially offloading it to DVC could give a sizeable chunk of this feature's functionality already in place. Since models and datasets could be heavy in size, Aim could support setting for multiple remotes that the user can choose from. To optimize storage Aim could provide functionality for storing various artefacts either locally or on some cloud storage (like S3 etc, which DVC already supports).

DVC supports several types of remote storage: local file system, SSH, Amazon S3, Google Cloud Storage, HTTP, HDFS, among others. Refer to dvc remote add for more details.

Adding some links that I had come across: a. https://mti-lab.github.io/blog/2021/03/03/dvc.html , b. https://dvc.org/doc/command-reference/remote#:~:text=DVC%20supports%20several%20types%20of,HTTP%2C%20HDFS%2C%20among%20others .

Note that: I just suggested for DVC here, but there could be better alternatives/libraries than DVC (which I might not be aware of) to do this as well. So if there are better alternatives, please explore for those as long as the end result of model/dataset versioning in Aim can be achieved. :)

Once these artefacts are saved they can be shown in each run's page for easier access.

Motivation

See above.

Pitch

See above.

Alternatives

None in Aim currently. For having end to end lifecycle of model/data tracking, user must have to use mlflow/dvc other than Aim which is not user friendly.

Additional context

cc:/ @SGevorg @gorarakelyan <- As per our discussion on Slack, please add more details here as required.

ashutoshsaboo avatar Mar 01 '22 13:03 ashutoshsaboo

Pinging on this, Any update on this issue? @gorarakelyan @SGevorg

ashutoshsaboo avatar Mar 21 '22 17:03 ashutoshsaboo

The model & data management/versioning in ModelDB https://github.com/VertaAI/modeldb worked quite well, until the Enterprise edition got prioritized and the open-source solution stagnated. I would find it really nice to have a solution similar to that one as an alternative to DVC

Atharex avatar Jun 02 '22 06:06 Atharex

Any update on this issue? @gorarakelyan

cccs-km avatar Jan 16 '24 18:01 cccs-km