kedro icon indicating copy to clipboard operation
kedro copied to clipboard

[DataCatalog]: Enhance `_FrozenDatasets` public API

Open ElenaKhaustova opened this issue 8 months ago • 0 comments

Description

Users face challenges with understanding and effectively utilizing the _FrozenDatasets public API due to unclear documentation and limitations. They struggle to get dataset by name, iterate through datasets and get metadata. They express uncertainty about the advantages of using _FrozenDatasets, and find it unintuitive to work with due to its underscore prefix and limited functionality compared to the private API.

We propose:

  1. Enhance the FrozenDatasets public API to provide more comprehensive functionality, including the ability to iterate over the datasets (https://github.com/kedro-org/kedro/issues/3916), access some metadata (type of dataset, type of file, filepath), and utilize methods like get_by_name() for flexible dataset retrieval.
  2. Increase users' awareness of the _FrozenDatasets API through tutorials and documentation updates. Highlight the public API's capabilities and provide guidance on how to use it effectively for dataset management and retrieval.
  3. Consider allowing DataCatalog modifications and getting rid of _FrozenDatasets - this is a broader question related to another issue that will be linked later.

Context

Some quotes from the user feedback:

  • "_FrozenDataset class is very confusing because we don't know exactly what's protected. I think the class itself starts with an underscore, so it doesn't really feel safe to loop over a catalog.datasets and to run into a private class. And I even don't know how to handle it whether when I use catalog.datasets, I think it's just a standard dictionary."
  • "_FrozenDataset does not have the get accessor, so one cannot get dataset by name thus prefer using private _get_datset() method."
  • "There's no straightforward way to iterate over frozen datasets like for dataset in catalog.datasets, so you have to iterate via names and use private _get_dataset() method."

Screenshot 2024-06-03 at 15 52 14

  • "Users often don't even know that catalog api exists."
  • "The public API primarily offers basic functions such as searching datasets by name and performing load and save operations. This restrictiveness often necessitates the use of private APIs to access more detailed metadata not available through the public API, so one has to break it."

Screenshot 2024-06-04 at 23 43 09

ElenaKhaustova avatar Jun 04 '24 22:06 ElenaKhaustova