Test DVC pipelines of "data_and_model/" with CI
Currently, our CI is never testing the content of data_and_models/, so it is possible that e.g. some code changes in src/ will break data_and_models/ and we don't realize it.
It is not clear yet how the DVC pipelines could be tested.
- Manually or automatically triggered by CI/CD?
- Use real data or fake data (like unit tests)?
Additional context
See also the part on DVC elements in https://github.com/BlueBrain/Search/pull/351#issuecomment-843037012 about what one should do to deal with DVC while working on data_and_models/.
Hello @FrancescoCasalegno,
Did we know about Studio, a tool from the people who made DVC and CML?
I have tried Studio (https://dvc.org/doc/studio). This user interface on top of DVC + CML is very interesting.
As we will train and tune more and more models, this could be very helpful.
Indeed, it let us manage DVC experiments and CML reports in a comprehensive and integrated way.
For example, all the experiments and plots I have done for #356 could have been compared, shared, and visualized in this tool.