kedro
kedro copied to clipboard
[DataCatalog]: Enhance `_FrozenDatasets` public API
Description
Users face challenges with understanding and effectively utilizing the _FrozenDatasets
public API due to unclear documentation and limitations. They struggle to get dataset by name, iterate through datasets and get metadata. They express uncertainty about the advantages of using _FrozenDatasets
, and find it unintuitive to work with due to its underscore prefix and limited functionality compared to the private API.
We propose:
- Enhance the
FrozenDatasets
public API to provide more comprehensive functionality, including the ability to iterate over the datasets (https://github.com/kedro-org/kedro/issues/3916), access some metadata (type of dataset, type of file, filepath), and utilize methods likeget_by_name()
for flexible dataset retrieval. - Increase users' awareness of the
_FrozenDatasets
API through tutorials and documentation updates. Highlight the public API's capabilities and provide guidance on how to use it effectively for dataset management and retrieval. - Consider allowing
DataCatalog
modifications and getting rid of_FrozenDatasets
- this is a broader question related to another issue that will be linked later.
Context
Some quotes from the user feedback:
- "
_FrozenDataset
class is very confusing because we don't know exactly what's protected. I think the class itself starts with an underscore, so it doesn't really feel safe to loop over acatalog.datasets
and to run into a private class. And I even don't know how to handle it whether when I usecatalog.datasets
, I think it's just a standard dictionary." - "
_FrozenDataset
does not have the get accessor, so one cannot get dataset by name thus prefer using private_get_datset()
method." - "There's no straightforward way to iterate over frozen datasets like
for dataset in catalog.datasets
, so you have to iterate via names and use private_get_dataset()
method."
- "Users often don't even know that catalog api exists."
- "The public API primarily offers basic functions such as searching datasets by name and performing load and save operations. This restrictiveness often necessitates the use of private APIs to access more detailed metadata not available through the public API, so one has to break it."