etl icon indicating copy to clipboard operation
etl copied to clipboard

Allowing public ETL steps to depend on private steps

Open Marigold opened this issue 6 months ago • 2 comments

We have a single case where public dataset (data://garden/covid/latest/combined and hence our full covid dataset) depends on private dataset data-private://garden/covid/latest/sequence.

data://garden/covid/latest/combined:
    - data://garden/covid/latest/testing
    - data://garden/covid/latest/cases_deaths
    - data-private://garden/covid/latest/sequence
    - data://garden/demography/2024-07-15/population

An error is raised when you try to run ETL without using --private flag. So running full ETL etl run fails with

ValueError: Public step data://garden/covid/latest/combined depends on private step data-private://garden/covid/latest/sequence. Use --private flag.

This is a bit annoying as we have to exclude covid dataset from running in nightly builds. It'd also be confusing for anyone trying to build it.

Should we exclude steps depending on private steps by default and raise a warning instead of failing?

Marigold avatar Aug 26 '24 07:08 Marigold