kedro-viz icon indicating copy to clipboard operation
kedro-viz copied to clipboard

MetricsDataSet doesn't work on S3

Open tbeijloos opened this issue 2 years ago • 2 comments

@noklam

Description

When using the MetricsDataSet with a filepath that refers to S3, kedro viz doesn't show any plots and cannot find the filepath. It seems like the MetricsDataSet doesn't support S3.

Context

When trying to use kedro viz for a MetricsDataset from S3, kedro viz didn't show any plots and neither did it show a filepath. When the dataset was saved to a local path, the plots show up.

Steps to Reproduce

  1. Save a MetricsDataset to S3
  2. Refer to this dataset in the Data Catalog as it being a MetricsDataSet with a filepath to S3
  3. Run kedro viz on this MetricsDataSet.

Expected Result

In kedro viz there should be a filepath for the dataset as well as plots showing.

Actual Result

there is no filepath, neither any plots.

-- FileNotFoundError: [WinError 3] The system cannot find the path specified: "project_name\\data\\09_tracking\\test_all_best_model.json"
Exception in ASGI application 

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

  • Web browser system: Microsoft edge
  • kedro viz verson 5.1.1
  • kedro version 0.18.2
  • Python version used: 3.9

Checklist

  • [x] Add a label to categorize this issue.

tbeijloos avatar Oct 04 '22 15:10 tbeijloos

Notes:

  1. It only fails for tracking.MetricsDataSet
  2. only MetricsDataSet is using loaded_versioned_tracking_data

This is likely the problem since it works on the local filesystem but fails on s3, viz has it only logic to load the data instead of using the dataset's own _load method.

https://github.com/kedro-org/kedro-viz/blob/d670180f0f5632a85bd7f8fbf78a77d65c8d36d4/package/kedro_viz/models/flowchart.py#L615-L648

noklam avatar Oct 04 '22 15:10 noklam

Assuming this relates to the metadata side panel (and not the experiment tracking screen itself), this is a known issue: https://github.com/kedro-org/kedro-viz/issues/1000. As explained there, I'm not sure how much time we should invest fixing it since it might well be removed pretty soon.

If we do want to fix it then that issue lists some other things that could be fixed at the same time also. For context, here's how I fixed the experiment tracking screen to work with s3: https://github.com/kedro-org/kedro-viz/pull/984#issuecomment-1204553144 38d7fb0

antonymilne avatar Oct 09 '22 05:10 antonymilne

Coming back to this as I am working on something related to the version. It would be much simpler to use higher-level functions like dataset._get_load_path. It's a bit strange to me the 2 functions used two different logic to traverse the directory. It's unclear to me why this need to be re-implemented in kedro-viz, maybe I missed out some context?

https://github.com/kedro-org/kedro-viz/blob/b2011d5b34fdcc7e02974a8c0834c2d70a7dc27f/package/kedro_viz/models/graph.py#L639-L691

noklam avatar Oct 20 '22 10:10 noklam

I really can't say either way. Would it be hard for you to investigate why it's done this way in Viz and if it's easy to fix the issue there then do so?

tynandebold avatar Oct 20 '22 15:10 tynandebold

We can close this issue. It's no longer valid as we have removed load_latest_tracking_dataset and load_versioned_tracking_dataset

@tynandebold

rashidakanchwala avatar Mar 20 '23 14:03 rashidakanchwala

Thank you!

tynandebold avatar Mar 20 '23 14:03 tynandebold