ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-10452. Improve Recon Disk Usage to fetch and display Top N records based on size.

Open ArafatKhan2198 opened this issue 11 months ago • 8 comments

What changes were proposed in this pull request?

  • This pull request introduces enhancements to the Recon disk usage endpoint to significantly improve usability and performance when dealing with large datasets:
  • Top Entities Focus: The endpoint has been updated to efficiently sort and display only the top entities by size. This targeted approach helps users easily identify the most significant space consumers, addressing the impracticality of visualizing thousands of records in a single view.
  • Efficient Sorting with Parallel Streams: To manage and sort vast numbers of records effectively, we've implemented parallel stream processing.
  • Key advantages of using parallel streams include :-
    1. Better Utilization of Multi-core Processors: Enables concurrent sorting operations across multiple cores, drastically cutting down processing times for large datasets.
    2. Optimized for Large Datasets: The parallelism overhead is more efficiently distributed over a large number of elements, making it particularly suited for our use case.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10452

How was this patch tested?

Manually Tested Out the API and also using Integration Testing :-

Results from Manual Testing :-

  • Created 4 files of 100MB, 10MB, 1MB & 10KB under dir-1
{
  "status": "OK",
  "path": "/volumetest/buckettest/dir1",
  "size": 111010000,
  "sizeWithReplica": -1,
  "subPathCount": 4,
  "subPaths": [
    {
      "key": true,
      "path": "/volumetest/buckettest/dir1/key100MB",
      "size": 100000000,
      "sizeWithReplica": -1,
      "isKey": true
    },
    {
      "key": true,
      "path": "/volumetest/buckettest/dir1/key10mb",
      "size": 10000000,
      "sizeWithReplica": -1,
      "isKey": true
    },
    {
      "key": true,
      "path": "/volumetest/buckettest/dir1/key1MB",
      "size": 1000000,
      "sizeWithReplica": -1,
      "isKey": true
    },
    {
      "key": true,
      "path": "/volumetest/buckettest/dir1/key10kb",
      "size": 10000,
      "sizeWithReplica": -1,
      "isKey": true
    }
  ],
  "sizeDirectKey": 111010000
}

ArafatKhan2198 avatar Mar 02 '24 20:03 ArafatKhan2198