kedro-viz
kedro-viz copied to clipboard
Document how to leverage YAML anchors & aliases to avoid copy-pasting properties in catalog
Description
This section of docs provides a guide to adding layers
to the visualization by defining them as follows:
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
Also it gives the following example below:
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
...
Context
In my projects I found it very helpful to use YAML anchors to save those 3 lines per layer into a variable like this:
_raw_layer: &raw_layer
metadata:
kedro-viz:
layer: 01_raw
And then reuse it like this:
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
<<: *raw_layer
reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
<<: *raw_layer
shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
<<: *raw_layer
Possible Implementation
What I propose to do it to add a small note admonition suggesting that YAML anchors & aliases can be a great fit here to avoid copypasting those 3 lines if you have e.g. 10 datasets defined in a layer.
By admonition I mean e.g. this:
It can mention that this feature is not Kedro-specific at all and enabled by YAML format itself, but I think it can be helpful since this trick is highly reusable and can simplify large catalogs quite a lot for users unfamiliar with anchors & aliases in YAML.
I do not propose to change the existing example which replicates those 3 lines 3 times. I think my suggestion better fits a note admonition.
Checklist
- [ ] Include labels so that we can categorise your feature request