clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Bug in get_dependency_graph for datasets with multiple parents

Open tensorfreitas opened this issue 5 months ago • 2 comments

Describe the bug

When a dataset has multiple parents the get_dependency_graph is not having the expected behaviour.

Example: I have created this example with 3 datasets that depend on each other like this figure:

Image

When I retrieve this from get_dependency_graph I would expect to see something like the web interface. Despite that, I only see dependencies until layer 2: Dependecy graph: {'8485a2145a64457b9704f9dd288d2dbc': [], 'd45f7216f74f4097b7e3d8c27c81217b': ['94dcbd8aa0a345c5b5b6fd7a601d6ae3']}

Expected behaviour

I would expect that dependencies would propagate until layer 1 dataset

Environment

  • Server type (self hostedl)
  • ClearML SDK Version 2.0
  • ClearML Server Version (Only for self hosted). WebApp: 2.0.0-613 • Server: 2.0.0-613 • API: 2.31
  • Python Version 3.11
  • OS Linux

Related Discussion

Related discussion: https://clearml.slack.com/archives/CTK20V944/p1749647076795919

tensorfreitas avatar Jun 17 '25 14:06 tensorfreitas

Hi @tensorfreitas ! I think this is indeed a bug. Does calling dataset._repair_dependency_graph() fix the problem for now?

@eugen-ajechiloae-clearml That does indeed seem to work! If I call

ds._repair_dependency_graph()
print(ds.get_dependency_graph())

It reports the correct structure. Thank you!

tensorfreitas avatar Jun 18 '25 15:06 tensorfreitas