etl
etl copied to clipboard
Allowing public ETL steps to depend on private steps
We have a single case where public dataset (data://garden/covid/latest/combined
and hence our full covid dataset) depends on private dataset data-private://garden/covid/latest/sequence
.
data://garden/covid/latest/combined:
- data://garden/covid/latest/testing
- data://garden/covid/latest/cases_deaths
- data-private://garden/covid/latest/sequence
- data://garden/demography/2024-07-15/population
An error is raised when you try to run ETL without using --private
flag. So running full ETL etl run
fails with
ValueError: Public step data://garden/covid/latest/combined depends on private step data-private://garden/covid/latest/sequence. Use --private flag.
This is a bit annoying as we have to exclude covid dataset from running in nightly builds. It'd also be confusing for anyone trying to build it.
Should we exclude steps depending on private steps by default and raise a warning instead of failing?