kedro-viz
kedro-viz copied to clipboard
MetricsDataSet doesn't work on S3
@noklam
Description
When using the MetricsDataSet with a filepath that refers to S3, kedro viz doesn't show any plots and cannot find the filepath. It seems like the MetricsDataSet doesn't support S3.
Context
When trying to use kedro viz for a MetricsDataset from S3, kedro viz didn't show any plots and neither did it show a filepath. When the dataset was saved to a local path, the plots show up.
Steps to Reproduce
- Save a MetricsDataset to S3
- Refer to this dataset in the Data Catalog as it being a MetricsDataSet with a filepath to S3
- Run kedro viz on this MetricsDataSet.
Expected Result
In kedro viz there should be a filepath for the dataset as well as plots showing.
Actual Result
there is no filepath, neither any plots.
-- FileNotFoundError: [WinError 3] The system cannot find the path specified: "project_name\\data\\09_tracking\\test_all_best_model.json"
Exception in ASGI application
Your Environment
Include as many relevant details as possible about the environment you experienced the bug in:
- Web browser system: Microsoft edge
- kedro viz verson 5.1.1
- kedro version 0.18.2
- Python version used: 3.9
Checklist
- [x] Add a label to categorize this issue.
Notes:
- It only fails for
tracking.MetricsDataSet
- only
MetricsDataSet
is usingloaded_versioned_tracking_data
This is likely the problem since it works on the local filesystem but fails on s3
, viz has it only logic to load the data instead of using the dataset's own _load
method.
https://github.com/kedro-org/kedro-viz/blob/d670180f0f5632a85bd7f8fbf78a77d65c8d36d4/package/kedro_viz/models/flowchart.py#L615-L648
Assuming this relates to the metadata side panel (and not the experiment tracking screen itself), this is a known issue: https://github.com/kedro-org/kedro-viz/issues/1000. As explained there, I'm not sure how much time we should invest fixing it since it might well be removed pretty soon.
If we do want to fix it then that issue lists some other things that could be fixed at the same time also. For context, here's how I fixed the experiment tracking screen to work with s3: https://github.com/kedro-org/kedro-viz/pull/984#issuecomment-1204553144 38d7fb0
Coming back to this as I am working on something related to the version. It would be much simpler to use higher-level functions like dataset._get_load_path
. It's a bit strange to me the 2 functions used two different logic to traverse the directory. It's unclear to me why this need to be re-implemented in kedro-viz
, maybe I missed out some context?
https://github.com/kedro-org/kedro-viz/blob/b2011d5b34fdcc7e02974a8c0834c2d70a7dc27f/package/kedro_viz/models/graph.py#L639-L691
I really can't say either way. Would it be hard for you to investigate why it's done this way in Viz and if it's easy to fix the issue there then do so?
We can close this issue. It's no longer valid as we have removed load_latest_tracking_dataset and load_versioned_tracking_dataset
@tynandebold
Thank you!