kedro
kedro copied to clipboard
[DataCatalog2.0]: Draft of `AbstractDataCatalog` and `KedroDataCatalog` (work in progress)
Description
Solves https://github.com/kedro-org/kedro/issues/3925, https://github.com/kedro-org/kedro/issues/3926, https://github.com/kedro-org/kedro/issues/3916
Development notes
This PR includes a draft of the following:
- Implement draft of
AbstractDataCatalog
andKedroDataCatalog(AbstractDataCatalog)
-
AbstractDataCatalog
supports instantiation from configuration and/or datasets -
AbstractDataCatalog
stores the configuration provided
- Rework dataset pattern resolution logic:
- Pattern resolution logic moved out from
_get_dataset()
toresolve_patterns()
- Pattern resolution logic split into actual resolution and updating datasets/configurations
-
_dataset_patterns
and_default_patterns
now obtained from config at the__init__
- Added
resolved_ds_configs
property to store resolved datasets' configurations at the catalog level -
add()
method adds or replaces the dataset and its configuration -
add_feed_dict()
renamed toadd_from_dict()
- introduces
_runtime_patterns
catalog field to keep the logic of processing dataset/default/runtime patterns clear - removed
shallow_copy()
method used to add extra_dataset_patterns at runtime, replaced it with dedicated -add_runtime_patterns()
method
- Rework dataset access logic
- Removed
_FrozenDatasets
and access datasets as properties - Add get dataset by name feature: dedicated function and access by key
- Added iterate over the datasets feature
- We still do not allow to modify dataset property but allow
add(replace=True)
- Make
KedroDataCatalog
mutable:
- We do not want to make
datasets
property public not to encourage behaviour when users configure the catalog via modifying thedatasets
dictionary -
_datasets
property remained protected, but publicdatasets
property was added, returning a deep copy of_datasets
while the setter is still not allowed; the same is applied to the_resolved_ds_configs
property - One can still extend and replace
_datasets
via thecatalog.add()
method
- To make
AbstractDataCatalog
compatible with the current runners' implementation several methods -release()
,confirm()
andexists()
were kept as the part of interface. But they only have a meaningful implementation forKedroDataCatalog
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by
line in the commit message. See our wiki for guidance.
If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by
line in the commit message. See our wiki for guidance.
If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
- [ ] Read the contributing guidelines
- [ ] Signed off each commit with a Developer Certificate of Origin (DCO)
- [ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [ ] Updated the documentation to reflect the code changes
- [ ] Added a description of this change in the
RELEASE.md
file - [ ] Added tests to cover my changes
- [ ] Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team