clearml
clearml copied to clipboard
Hierarchical View for Datasets UI to match Projects View (Feature Request)
I recently updated to the newest version of ClearML Server 1.6.0
to try out the new Datasets panel feature.
Expected Behaviour:
Subfolders work as expected when creating them inside of projects containing tasks, see screenshot below using the Urbansounds example:
Actual Behaviour:
When trying to create sub-dataset called raw_dataset for Datasets within the same project, I get the following behaviour instead:
Dataset are also not separated by projects, so the deepest sub-datasets are all displayed on the same level as shown below:
What I expected as a user is to have the project name as the root directory in the Datasets panel, and then have the sub-datasets within the project.
Is this the intended behaviour?
@AH-Merii This is indeed the intended behaviour, the rational being to provide a data-centric view i.e. put all of the available data in front of the user regardless of how their experiments are organized.
Does this make sense?
@ainoam so the subfolders view when it comes to projects and tasks, is meant to behave differently than the datasets view? I find it harder to find the datasets and differentiating them without any parent label. The only way for me to differentiate them is by hovering over them as shown in the screenshots above.
Also, currently the search only captures the dataset names and not the parent project/dataset when using the search bar:
![image](https://user-images.githubusercontent.com/43741215/179474538-4f9c3b1c-dcfa-4a9b-beac-8b5aaeaaaf88.png)
Even though the dataset below is part of that project as shown in the screenshot below:
![image](https://user-images.githubusercontent.com/43741215/179474630-cffb76f3-97ba-4c63-9009-0f63b05b9dee.png)
@AH-Merii
Also, currently the search only captures the dataset names and not the parent project/dataset when using the search bar
This is in-line with the aforementioned rational: This view is disconnected from any project hierarchy (though the information is provided, as additional details).
I find it harder to find the datasets and differentiating them without any parent label
Note that you can indeed tag your datasets
If I understand correctly, what you're saying is that it would be helpful if the datasets page would provide to additionally be able to organize the datasets by their project?
@ainoam Yes, that's exactly what I am saying. Reason being, is if you look at the screenshots above you will notice, that it does in fact reference the project in the dataset view when you hover over it: ClearML Examples/Urbansounds
then it gets into the .datasets/UrbanSounds example/raw
dataset. Which means that they are still connected.
I guess, I am finding it really confusing from a user perspective the way the datasets are connected to the tasks. I wouldn't be able to differentiate the raw dataset for the Urbancloud example, vs the raw dataset for the palmer_penguins project unless I hovered over them.
Also, let's assume that we do not want to connect the projects to the datasets, what's the point of having subfolders for the datasets if we can't see the subfolders in the main dataset?
Let's have a look at the example below:
In the example below, I am attempting to delete the UrbanSounds example dataset, but I am unable to do so.
This is because we have a sub-dataset called
UrbanSounds/raw_data
, however it does not show that unless I manually hover over each and every one of my of the other datasets to find the one that is related to UrbanSounds.
Then I can proceed and delete the dataset.
What I am trying to show is that this whole experience is confusing as a user. I would expect the dataset folder structure to behave exactly like the project folder structure. Because if that's not the case, then what's the point of having the ability to make subfolders (sub-datasets), other than confusing everyone using it?
@AH-Merii Can you share how you ended up creating one dataset within another? With which ClearML SDK version were these created?
I am using the original code provided in UrbanSounds repo. I only modified get_data.py
:
@ainoam Initial Code without subfolder:
dataset = Dataset.create(
dataset_name='UrbanSounds example',
dataset_project='ClearML examples/Urbansounds',
)
Code with Subfolder:
dataset = Dataset.create(
dataset_name='UrbanSounds example/raw_data',
dataset_project='ClearML examples/Urbansounds',
)
So in short the same way you create sub-projects, except applied on Dataset.create()
.
Using clearml SDK version: 1.6.2
.
@ainoam, were you able to reproduce it, and if so, is this behaviour intended?
@AH-Merii This is definitely the intended behaviour: As you've noticed the Datasets are implemented as projects (You can even browse this project hierarchy through the Projects pages once you enable "Show hidden projects" in the Settings/Configuration page), and so when you create an "X/Y" dataset, you actually create a Y dataset within an X dataset as you'd expect.
You're idea for providing an additional hierarchical view for datasets definitely makes sense - We'll add it to our workplan for the coming releases (Appreciate if you would rename the issue to better reflect).
@ainoam, Thanks for the reply will update it now, can you please label this issue as an feature request, as I am unable to do so.
Let's have a look at the example below: In the example below, I am attempting to delete the UrbanSounds example dataset, but I am unable to do so.
![]()
This is because we have a sub-dataset called
UrbanSounds/raw_data
, however it does not show that unless I manually hover over each and every one of my of the other datasets to find the one that is related to UrbanSounds.Then I can proceed and delete the dataset.
@ainoam What about the issue shown above, shall I write up a new issue specifically for deleting the sub-datasets?
@AH-Merii Apologies for the slight lag :)
Am I understanding correctly that the counters for the datasets are wrong? For order's sake a new issue would be best.
@AH-Merii Looking at providing a project view of Datasets, what information would you say would be useful for the project summary card, for example Number of datasets in project? Total number of versions for all datasets? Total size of latest versions of all datasets?
Hey @AH-Merii! clearml-server v1.10 is now out supporting a project hierarchy view for ClearML resources (pipelines, datasets, reports)
Hey @AH-Merii! clearml-server v1.10 is now out supporting a project hierarchy view for ClearML resources (pipelines, datasets, reports)
Congratulations on the release, I look forward to testing it and rolling it out to the rest of the team!
Thank you for being responsive to the feedback!