clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Hierarchical View for Datasets UI to match Projects View (Feature Request)

Open AH-Merii opened this issue 1 year ago • 12 comments

I recently updated to the newest version of ClearML Server 1.6.0 to try out the new Datasets panel feature.

Expected Behaviour: Subfolders work as expected when creating them inside of projects containing tasks, see screenshot below using the Urbansounds example: image

Actual Behaviour: When trying to create sub-dataset called raw_dataset for Datasets within the same project, I get the following behaviour instead: image image image

Dataset are also not separated by projects, so the deepest sub-datasets are all displayed on the same level as shown below: image

What I expected as a user is to have the project name as the root directory in the Datasets panel, and then have the sub-datasets within the project.

Is this the intended behaviour?

AH-Merii avatar Jul 18 '22 07:07 AH-Merii

@AH-Merii This is indeed the intended behaviour, the rational being to provide a data-centric view i.e. put all of the available data in front of the user regardless of how their experiments are organized.

Does this make sense?

ainoam avatar Jul 18 '22 08:07 ainoam

@ainoam so the subfolders view when it comes to projects and tasks, is meant to behave differently than the datasets view? I find it harder to find the datasets and differentiating them without any parent label. The only way for me to differentiate them is by hovering over them as shown in the screenshots above.

AH-Merii avatar Jul 18 '22 08:07 AH-Merii

Also, currently the search only captures the dataset names and not the parent project/dataset when using the search bar:

image

Even though the dataset below is part of that project as shown in the screenshot below:

image

AH-Merii avatar Jul 18 '22 08:07 AH-Merii

@AH-Merii

Also, currently the search only captures the dataset names and not the parent project/dataset when using the search bar

This is in-line with the aforementioned rational: This view is disconnected from any project hierarchy (though the information is provided, as additional details).

I find it harder to find the datasets and differentiating them without any parent label

Note that you can indeed tag your datasets image

If I understand correctly, what you're saying is that it would be helpful if the datasets page would provide to additionally be able to organize the datasets by their project?

ainoam avatar Jul 18 '22 12:07 ainoam

@ainoam Yes, that's exactly what I am saying. Reason being, is if you look at the screenshots above you will notice, that it does in fact reference the project in the dataset view when you hover over it: ClearML Examples/Urbansounds then it gets into the .datasets/UrbanSounds example/raw dataset. Which means that they are still connected.

I guess, I am finding it really confusing from a user perspective the way the datasets are connected to the tasks. I wouldn't be able to differentiate the raw dataset for the Urbancloud example, vs the raw dataset for the palmer_penguins project unless I hovered over them.

Also, let's assume that we do not want to connect the projects to the datasets, what's the point of having subfolders for the datasets if we can't see the subfolders in the main dataset?

Let's have a look at the example below: In the example below, I am attempting to delete the UrbanSounds example dataset, but I am unable to do so. image image This is because we have a sub-dataset called UrbanSounds/raw_data, however it does not show that unless I manually hover over each and every one of my of the other datasets to find the one that is related to UrbanSounds. image Then I can proceed and delete the dataset.

What I am trying to show is that this whole experience is confusing as a user. I would expect the dataset folder structure to behave exactly like the project folder structure. Because if that's not the case, then what's the point of having the ability to make subfolders (sub-datasets), other than confusing everyone using it?

AH-Merii avatar Jul 18 '22 13:07 AH-Merii

@AH-Merii Can you share how you ended up creating one dataset within another? With which ClearML SDK version were these created?

ainoam avatar Jul 18 '22 13:07 ainoam

I am using the original code provided in UrbanSounds repo. I only modified get_data.py:

@ainoam Initial Code without subfolder:

    dataset = Dataset.create(
        dataset_name='UrbanSounds example',
        dataset_project='ClearML examples/Urbansounds',
    )

Code with Subfolder:

    dataset = Dataset.create(
        dataset_name='UrbanSounds example/raw_data',
        dataset_project='ClearML examples/Urbansounds',
    )

So in short the same way you create sub-projects, except applied on Dataset.create().

Using clearml SDK version: 1.6.2.

AH-Merii avatar Jul 18 '22 13:07 AH-Merii

@ainoam, were you able to reproduce it, and if so, is this behaviour intended?

AH-Merii avatar Jul 20 '22 04:07 AH-Merii

@AH-Merii This is definitely the intended behaviour: As you've noticed the Datasets are implemented as projects (You can even browse this project hierarchy through the Projects pages once you enable "Show hidden projects" in the Settings/Configuration page), and so when you create an "X/Y" dataset, you actually create a Y dataset within an X dataset as you'd expect.

You're idea for providing an additional hierarchical view for datasets definitely makes sense - We'll add it to our workplan for the coming releases (Appreciate if you would rename the issue to better reflect).

ainoam avatar Jul 20 '22 08:07 ainoam

@ainoam, Thanks for the reply will update it now, can you please label this issue as an feature request, as I am unable to do so.

AH-Merii avatar Jul 21 '22 06:07 AH-Merii

Let's have a look at the example below: In the example below, I am attempting to delete the UrbanSounds example dataset, but I am unable to do so. image image This is because we have a sub-dataset called UrbanSounds/raw_data, however it does not show that unless I manually hover over each and every one of my of the other datasets to find the one that is related to UrbanSounds. image Then I can proceed and delete the dataset.

@ainoam What about the issue shown above, shall I write up a new issue specifically for deleting the sub-datasets?

AH-Merii avatar Jul 21 '22 06:07 AH-Merii

@AH-Merii Apologies for the slight lag :)

Am I understanding correctly that the counters for the datasets are wrong? For order's sake a new issue would be best.

ainoam avatar Jul 28 '22 17:07 ainoam

@AH-Merii Looking at providing a project view of Datasets, what information would you say would be useful for the project summary card, for example Number of datasets in project? Total number of versions for all datasets? Total size of latest versions of all datasets?

ainoam avatar Jan 11 '23 16:01 ainoam

Hey @AH-Merii! clearml-server v1.10 is now out supporting a project hierarchy view for ClearML resources (pipelines, datasets, reports)

pollfly avatar Apr 03 '23 12:04 pollfly

Hey @AH-Merii! clearml-server v1.10 is now out supporting a project hierarchy view for ClearML resources (pipelines, datasets, reports)

Congratulations on the release, I look forward to testing it and rolling it out to the rest of the team!

Thank you for being responsive to the feedback!

AH-Merii avatar Apr 04 '23 14:04 AH-Merii