ozone
ozone copied to clipboard
HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume
What changes were proposed in this pull request?
When the number of keys/volume/bucket are huge, the current disk usage UI doesnt make much sense. This pull request introduces enhancements to the Recon disk usage endpoint to significantly improve usability and performance when dealing with large datasets: Top Entities Focus: The endpoint has been updated to efficiently sort and display only the top entities by size. This targeted approach helps users easily identify the most significant space consumers, addressing the impracticality of visualizing thousands of records in a single view. Efficient Sorting with Parallel Streams: To manage and sort vast numbers of records effectively, we've implemented parallel stream processing.
Key advantages of using parallel streams include :- Better Utilization of Multi-core Processors: Enables concurrent sorting operations across multiple cores, drastically cutting down processing times for large datasets. Optimized for Large Datasets: The parallelism overhead is more efficiently distributed over a large number of elements, making it particularly suited for our use case.
Backend PR For Reference:- https://github.com/apache/ozone/pull/6318
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9626
How was this patch tested?
Manually
Before this PR
After this PR
Tested with Cluster Data
@dombizita @devmadhuu Can you please take a look at this patch.
Thanks for working on this @smitajoshi12 While testing this patch locally I noticed a few discrepancies while setting the Display Limit :-
- I currently have 56 keys in my cluster all of which are present inside the
buckettest
.1. When I set the display limit to 5 I notice that 5 objects of the highest size is displayed and also the remaining objects are clubbed inside the
Other Objects
2. For 20 I get the correct result as well :-
3. But when I set the limit to 30 I do not see the
Other Objects
slot anywhere even though there are a total of 56 keys hence the remaining 26 Keys need to get clubbed intoOther Objects
.
@ArafatKhan2198 Corrected in next commit.
Thanks for working on this @smitajoshi12. To use the improvements in the namespace endpoint that @ArafatKhan2198 introduced in #6318, you need to change the endpoint that you call here:
https://github.com/apache/ozone/blob/1cbee607f83d70d85239c641e90b712bd0d5d187/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx#L132
The
sortSubPaths
needs to be settrue
. https://github.com/apache/ozone/blob/21fa62fdc963641117f819fb75a1abb189d1c614/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/NSSummaryEndpoint.java#L115
@dombizita @ArafatKhan2198 Addresed above comments in latest commit and updated screenshots with cluster data.
Thanks for updating the patch, @smitajoshi12. We are now using the correct API parameters for sorting the subpaths, but there is still an issue from the UI perspective. Let's say we have three files:
file1 -> Size -> 1 KB file2 -> Size -> 10 KB file3 -> Size -> 1 GB
The API endpoint would return a response in descending order of size. However, the problem is that the UI representation becomes skewed, as shown in the image below: Here, we have three directories with sizes 1 KB, 10 KB, and 1 GB. I believe the size of each part of the pie chart is relative to the file size, but this creates a poor user experience. We need to address this issue to improve the user interface.
Could you please take care of this!
@ArafatKhan2198 Can we raise seprate JIRA for it as it is known issue need to work on lots of changes. As we used Normalization in Heatmap also. Raised Seprate JIRA[ https://issues.apache.org/jira/browse/HDDS-10864 ]
Thanks @smitajoshi12 for working on this patch. Thanks @dombizita , @ArafatKhan2198 for reviewing the patch.