aim
aim copied to clipboard
Model and dataset versioning in Aim and render saved artefacts on UI
🚀 Feature
Add support to track & version various models/datasets across runs/experiments. Aim does an excellent job at tracking, but there's no way currently for end to end model/data artefact management currently in Aim. The objective is to enhance reproducibility, and with just a couple of lines of code Aim (just like mlflow for inspiration) should ideally do the heavy-lifting of serializing and saving models as well as associating it to the dataset version.
Looks like DVC does most of this already and has excellent integration with Git, and potentially offloading it to DVC could give a sizeable chunk of this feature's functionality already in place. Since models and datasets could be heavy in size, Aim could support setting for multiple remotes that the user can choose from. To optimize storage Aim could provide functionality for storing various artefacts either locally or on some cloud storage (like S3 etc, which DVC already supports).
DVC supports several types of remote storage: local file system, SSH, Amazon S3, Google Cloud Storage, HTTP, HDFS, among others. Refer to dvc remote add for more details.
Adding some links that I had come across: a. https://mti-lab.github.io/blog/2021/03/03/dvc.html , b. https://dvc.org/doc/command-reference/remote#:~:text=DVC%20supports%20several%20types%20of,HTTP%2C%20HDFS%2C%20among%20others .
Note that: I just suggested for DVC here, but there could be better alternatives/libraries than DVC (which I might not be aware of) to do this as well. So if there are better alternatives, please explore for those as long as the end result of model/dataset versioning in Aim can be achieved. :)
Once these artefacts are saved they can be shown in each run's page for easier access.
Motivation
See above.
Pitch
See above.
Alternatives
None in Aim currently. For having end to end lifecycle of model/data tracking, user must have to use mlflow/dvc other than Aim which is not user friendly.
Additional context
cc:/ @SGevorg @gorarakelyan <- As per our discussion on Slack, please add more details here as required.
Pinging on this, Any update on this issue? @gorarakelyan @SGevorg
The model & data management/versioning in ModelDB https://github.com/VertaAI/modeldb worked quite well, until the Enterprise edition got prioritized and the open-source solution stagnated. I would find it really nice to have a solution similar to that one as an alternative to DVC
Any update on this issue? @gorarakelyan