kedro
kedro copied to clipboard
[DataCatalog]: Simplify the way to access catalog
Description
Currently, there are two ways of accessing catalog: use DataCatalog.load_from_config()
method or instantiate a KedroSession
, load context and access catalog from there.
Users point that:
- accessing the catalog from a Kedro session is complex and requires an understanding of framework details, such as project creation and environment setup;
- acquiring the catalog involves writing a lot of code and navigating through parameters that are out of the context of their work;
- creating a Kedro session too heavy for simple catalog reading tasks.
We propose to explore the feasibility of developing a clear and intuitive API for accessing the catalog directly from a Kedro project, eliminating the need for a session / hiding session creation.
Context
The current method for acquiring the Data Catalog is cumbersome and involves multiple complex steps, making it less user-friendly. The necessity to initiate a Kedro session and create a context adds unnecessary complexity for users who simply want to access the catalog. The pain point identified involves the complexity and inconsistency in accessing the data catalog from a Kedro project. The user highlights that obtaining the catalog typically requires navigating the Kedro documentation to find the appropriate code snippet to copy and paste, which is cumbersome and inefficient. To address this issue, the user created a custom function, catalog_from_project()
, to streamline the process. This function simplifies the task but also suggests that such a utility might be beneficial if included directly within Kedro itself, improving accessibility and user experience.
Frequent changes in this methods for acquiring a Kedro catalog across different versions (such as changes from Kedro 0.16 to 0.17) create difficulties in maintaining compatibility. This variability requires developers to implement complex logic in plugins like Kedro-viz to adapt to version differences.
Some users suggest having read-only DataCatalog
Instance: creating a data catalog instance, at least for read-only use cases, which do not rely on creating a full-blown Kedro session.
Implementation Notes
The session creation step is needed to apply hooks that can change the catalog upon loading, so it can be hard to eliminate session creation completely. We can consider encapsulating session creation logic and providing an interface such as from kedro.framework.project.session.context import catalog
or/andfrom kedro.framework.project import catalog
with or without session creation.