htop icon indicating copy to clipboard operation
htop copied to clipboard

Normalize DiskIOMeter bar by number of disks

Open Explorer09 opened this issue 1 year ago • 11 comments

The maximum utilization should be "100% * number of disks". Set the bar and graph of the meter to draw with that maximum. Also remove the 100% cap on the utilization percentage display.

Resolves #1374.

Explorer09 avatar Sep 06 '24 04:09 Explorer09

Something like this patch will provide the PCP value you seek. You might want to consider a similar change for the NetworkIOMeter, in which case there is a "hinv.ninterface" metric that can be used similarly.

pcp/Metric.h
--- /tmp/git-blob-5igpu5/Metric.h	2024-09-06 15:32:13.776523149 +1000
+++ pcp/Metric.h	2024-09-06 15:26:46.052772959 +1000
@@ -26,6 +26,7 @@ typedef enum Metric_ {
    PCP_CONTROL_THREADS,         /* proc.control.perclient.threads */
 
    PCP_HINV_NCPU,               /* hinv.ncpu */
+   PCP_HINV_NDISK,              /* hinv.ndisk */
    PCP_HINV_CPUCLOCK,           /* hinv.cpu.clock */
    PCP_UNAME_SYSNAME,           /* kernel.uname.sysname */
    PCP_UNAME_RELEASE,           /* kernel.uname.release */
pcp/Platform.c
--- /tmp/git-blob-JCaR3p/Platform.c	2024-09-06 15:32:13.956523525 +1000
+++ pcp/Platform.c	2024-09-06 15:30:56.940361942 +1000
@@ -127,6 +127,7 @@ static const char* Platform_metricNames[
    [PCP_CONTROL_THREADS] = "proc.control.perclient.threads",
 
    [PCP_HINV_NCPU] = "hinv.ncpu",
+   [PCP_HINV_NDISK] = "hinv.ndisk",
    [PCP_HINV_CPUCLOCK] = "hinv.cpu.clock",
    [PCP_UNAME_SYSNAME] = "kernel.uname.sysname",
    [PCP_UNAME_RELEASE] = "kernel.uname.release",
@@ -385,6 +386,7 @@ bool Platform_init(void) {
    Metric_enable(PCP_PID_MAX, true);
    Metric_enable(PCP_BOOTTIME, true);
    Metric_enable(PCP_HINV_NCPU, true);
+   Metric_enable(PCP_HINV_NDISK, true);
    Metric_enable(PCP_PERCPU_SYSTEM, true);
    Metric_enable(PCP_UNAME_SYSNAME, true);
    Metric_enable(PCP_UNAME_RELEASE, true);
@@ -753,6 +755,8 @@ bool Platform_getDiskIO(DiskIOData* data
       data->totalBytesWritten = value.ull;
    if (Metric_values(PCP_DISK_ACTIVE, &value, 1, PM_TYPE_U64) != NULL)
       data->totalMsTimeSpend = value.ull;
+   if (Metric_values(PCP_HINV_NDISK, &value, 1, PM_TYPE_U64) != NULL)
+      data->numDisks = value.ull;
    return true;
 }
 

natoscott avatar Sep 06 '24 05:09 natoscott

@natoscott There is no "utilization percentage" for NetworkIOMeter at the moment. So I would consider showing the number of interfaces a feature separate from this pull request.

Explorer09 avatar Sep 06 '24 05:09 Explorer09

Understood & agreed. FWIW, I think this approach is not ideal (shares the same problem of the original 100% limiting, as mentioned by the reporter of #1374 - the code was fine the way it was originally). We need to be aware here of properties of some hardware described in the iostat(1) man page ...

              %util  Percentage  of elapsed time during which I/O requests were
                     issued to the device (bandwidth utilization  for  the  de‐
                     vice).  Device  saturation occurs when this value is close
                     to 100% for devices serving requests  serially.   But  for
                     devices  serving requests in parallel, such as RAID arrays
                     and modern SSDs, this number does not reflect  their  per‐
                     formance limits.

natoscott avatar Sep 06 '24 05:09 natoscott

@natoscott What was the ideal approach anyway?

Explorer09 avatar Sep 06 '24 06:09 Explorer09

Just reverting to the original htop behaviour as the issue reporter suggested.

natoscott avatar Sep 06 '24 06:09 natoscott

@natoscott Look in my patch carefully. I separated the utilization value into two. cached_utilisation_diff would revert to original behavior of allowing more than 100% to be shown; the bar would be drawn with cached_utilisation_norm which is the normalized value.

Explorer09 avatar Sep 06 '24 06:09 Explorer09

@natoscott Look in my patch carefully. I separated the utilization value into two. cached_utilisation_diff would revert to original behavior of allowing more than 100% to be shown; the bar would be drawn with cached_utilisation_norm which is the normalized value.

Two thumbs up - good stuff, thanks. Regarding your data type question, I think I know what @BenBE is going to suggest: size_t FTW. :) However, systems with more than 4 billion disks are few and far between so I'm sure a 32 bit unsigned value would suffice. If noone has strong opinions, I suggest sticking with what you have now since that's consistent with the surrounding code.

natoscott avatar Sep 06 '24 23:09 natoscott

Regarding your data type question, I think I know what @BenBE is going to suggest: size_t FTW. :)

For PCP this would require a downcast from uint64_t and an (numDisks <= SIZE_MAX) check that I don't think it's worth it. That's why I chose a larger data type than size_t for now.

Explorer09 avatar Sep 07 '24 04:09 Explorer09

Regarding your data type question, I think I know what @BenBE is going to suggest: size_t FTW. :)

For PCP this would require a downcast from uint64_t and...

FWIW, for PCP to use 32-bit types we would change this section:

   if (Metric_values(PCP_HINV_NDISK, &value, 1, PM_TYPE_U64) != NULL)
      data->numDisks = value.ull;

to

   if (Metric_values(PCP_HINV_NDISK, &value, 1, PM_TYPE_U32) != NULL)
      data->numDisks = value.ul;

but, keep it as you have it, it's fine IMO.

natoscott avatar Sep 07 '24 05:09 natoscott

@ShaiMagal: Have you tested the PR?

fasterit avatar Oct 19 '24 14:10 fasterit

@natoscott When this PR will be merged? It's "game changer" for overall about disk activity :) Should it be merged sooner than in 3.4.0?

ShaiMagal avatar Dec 08 '24 07:12 ShaiMagal