ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume

Open smitajoshi12 opened this issue 10 months ago • 3 comments

What changes were proposed in this pull request?

When the number of keys/volume/bucket are huge, the current disk usage UI doesnt make much sense. This pull request introduces enhancements to the Recon disk usage endpoint to significantly improve usability and performance when dealing with large datasets: Top Entities Focus: The endpoint has been updated to efficiently sort and display only the top entities by size. This targeted approach helps users easily identify the most significant space consumers, addressing the impracticality of visualizing thousands of records in a single view. Efficient Sorting with Parallel Streams: To manage and sort vast numbers of records effectively, we've implemented parallel stream processing.

Key advantages of using parallel streams include :- Better Utilization of Multi-core Processors: Enables concurrent sorting operations across multiple cores, drastically cutting down processing times for large datasets. Optimized for Large Datasets: The parallelism overhead is more efficiently distributed over a large number of elements, making it particularly suited for our use case.

Backend PR For Reference:- https://github.com/apache/ozone/pull/6318

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9626

How was this patch tested?

Manually Before this PR image

After this PR Tested with Cluster Data image

image

image

smitajoshi12 avatar Apr 16 '24 09:04 smitajoshi12

@dombizita @devmadhuu Can you please take a look at this patch.

swamirishi avatar Apr 22 '24 16:04 swamirishi

Thanks for working on this @smitajoshi12 While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-

  • I currently have 56 keys in my cluster all of which are present inside the buckettest.

1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the Other Objects

image

2. For 20 I get the correct result as well :-

image

3. But when I set the limit to 30 I do not see the Other Objects slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed into Other Objects.

image

@ArafatKhan2198 Corrected in next commit.

image

smitajoshi12 avatar Apr 29 '24 12:04 smitajoshi12

Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:

https://github.com/apache/ozone/blob/1cbee607f83d70d85239c641e90b712bd0d5d187/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx#L132

The sortSubPaths needs to be set true. https://github.com/apache/ozone/blob/21fa62fdc963641117f819fb75a1abb189d1c614/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/NSSummaryEndpoint.java#L115

@dombizita @ArafatKhan2198 Addresed above comments in latest commit and updated screenshots with cluster data.

smitajoshi12 avatar May 07 '24 16:05 smitajoshi12

Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:

file1 -> Size -> 1 KB
file2 -> Size -> 10 KB
file3 -> Size -> 1 GB

The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below: Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.

image Could you please take care of this!

@ArafatKhan2198 Can we raise seprate JIRA for it as it is known issue need to work on lots of changes. As we used Normalization in Heatmap also. Raised Seprate JIRA[ https://issues.apache.org/jira/browse/HDDS-10864 ]

smitajoshi12 avatar May 15 '24 05:05 smitajoshi12

Thanks @smitajoshi12 for working on this patch. Thanks @dombizita , @ArafatKhan2198 for reviewing the patch.

devmadhuu avatar Jun 10 '24 13:06 devmadhuu