kedro
kedro copied to clipboard
Allow for specifying extra node dependencies
Description
I've always felt like Kedro misses the ability to specify additional dependencies among nodes, which are not dataset related.
Context
For instance, consider the problem of filling a knowledge graph though Kedro. Obviously, there's two main nodes:
- Write nodes
- Write edges
However, the edges cannot be written before the nodes were pushed. There is hence no "dataset" dependency between the nodes, but rather an execution dependency.
Possible Implementation
Adding this to Kedro would involve 1) addition to the node system and 2) and update to the topological execution mechanism. With respect to the nodes, dependencies could be specified as follows:
def create_pipeline(**kwargs) -> Pipeline:
"""Create embeddings pipeline."""
return pipeline(
[
node(
func=write_nodes,
inputs=[
"int.nodes"
],
outputs="prm.nodes",
name="write_nodes",
),
node(
func=write_edges,
inputs=[
"int.edges"
],
outputs="prm.edges",
name="write_edges",
dependencies=["write_nodes"]
)
]
)
Possible Alternatives
The current work-around is to add "artificial" dataset dependencies among the nodes. This has the drawback that the function signatures of those nodes are polluted.