Search icon indicating copy to clipboard operation
Search copied to clipboard

Test DVC pipelines of "data_and_model/" with CI

Open FrancescoCasalegno opened this issue 4 years ago • 2 comments

Currently, our CI is never testing the content of data_and_models/, so it is possible that e.g. some code changes in src/ will break data_and_models/ and we don't realize it.

It is not clear yet how the DVC pipelines could be tested.

  • Manually or automatically triggered by CI/CD?
  • Use real data or fake data (like unit tests)?

FrancescoCasalegno avatar May 25 '21 19:05 FrancescoCasalegno

Additional context

See also the part on DVC elements in https://github.com/BlueBrain/Search/pull/351#issuecomment-843037012 about what one should do to deal with DVC while working on data_and_models/.

pafonta avatar May 26 '21 07:05 pafonta

Hello @FrancescoCasalegno,

Did we know about Studio, a tool from the people who made DVC and CML?

I have tried Studio (https://dvc.org/doc/studio). This user interface on top of DVC + CML is very interesting.

As we will train and tune more and more models, this could be very helpful.

Indeed, it let us manage DVC experiments and CML reports in a comprehensive and integrated way.

For example, all the experiments and plots I have done for #356 could have been compared, shared, and visualized in this tool.

pafonta avatar May 27 '21 09:05 pafonta