ploomber
ploomber copied to clipboard
product/upstream mismatch
If a user is leveraging the template, editing the get task for instance, to have 2 outputs when originally it was 1. And he forgot to change the consuming task, it'll break with a long error on missing file when actually the reference is missing.
i.e:
- source: get.py
product:
nb: output/get.ipynb
data: output/get.parquet
and having a downstream task features which has this code:
data = pd.read_parquet(str(upstream['get']))
Instead of upstream['get']['data'].
We'd probably need to analyze the size of output input and alert on mismatches.