perfmon-agent icon indicating copy to clipboard operation
perfmon-agent copied to clipboard

CPU by Process not working fine (pid=${PID}:percent)

Open NirBY opened this issue 6 years ago • 3 comments

Scenario #1: OS: Linux centoes 2.6.32-x86_64 Num of cores: 2 Config used: "label=CPU:pid=${PID}:percent"

How it looks when using "top -p ": https://www.screencast.com/t/LyBEMhAZ73o

How it looked from jmeter: https://www.screencast.com/t/nQhZgPucFG

Scenario #2: OS: Linux centoes 2.6.32-x86_64 Num of cores: 1 Config used: "label=CPU:pid=${PID}:percent"

How it looks when using "top -p ": image

How it looked from jmeter: image

NirBY avatar Nov 14 '18 16:11 NirBY

I am using a workaround for now a "EXEC" /bin/sh:-c:top -b -n 1 -p ${PID} | awk 'NR>7 { print $9; }'|head -n 1

NirBY avatar Nov 15 '18 09:11 NirBY

I have the same issue. If you request CPU percent values of a specific process, e.g. by pid or name (in my case "name=httpd#1:percent" or just "name=httpd#1" (percent is default), the result is off by about a factor of 10.

Checking the most likely relevant source file ( https://github.com/undera/perfmon-agent/blob/master/src/kg/apc/perfmon/metrics/CPUProcMetric.java ) I see:

public void getValue(StringBuffer res) throws SigarException { ProcCpu cpu = sigarProxy.getProcCpu(params.PID); // ... case PERCENT: val = 100 * cpu.getPercent(); break;

Based on this my guess is: sigarProxy.getProcCpu(...) has a bug (returning factor 10 too low) and PerfMon tried to add a workaround for that here, but introduced another bug by writing 100 instead of 10. Fixing both would be nice. Getting wrong numbers isn't really acceptable.

BTW: When I read the rest of this switch, I'm quite sure that's not okay, too: relevant stuff:

val and prev are double cur is long prev starts as -1

case TOTAL: cur = cpu.getTotal(); val = prev > 0 ? cur - prev : 0; prev = cur; break; case SYSTEM: cur = cpu.getSys(); val = prev > 0 ? cur - prev : 0; prev = cur; break; case USER: cur = cpu.getUser(); val = prev > 0 ? cur - prev : 0; prev = cur; break;

  1. Question: Why does the previous value need to be >0 to start? If sigar sends invalid (negative) values report a bug and leave invalid values out of the graph (makes more sense). What happens here instead is: always cutting off at least the first value and any subsequent negative value, and wrongly producing negative values should any later incoming value be negative ... likely not what was intended.
  2. Due to the fact that cur is long: cur - prev will always be long (math with different types always results in the lowest type), which implicitly reduces the precision radically by discarding all decimals (and readding zeros with the assignment to val.
  3. if you use "total", e.g. "name=httpd#1:total" the result is off by a factor of 100 (independent of the decimal chopping) so I guess sigar made a similar mistake in getTotal().

As I don't have time to deep dive in the native part of sigar (and it looks like the bigger issue is somewhere there), I can't say where the relevant erroneous part on their side is.

My "2cts." idea (apart from the possible small fixes in PerfMon which would be nice if the PerfMon authors could do that) would be: The main author of PerfMon should probably talk with the main author of Sigar and get those simple math issues fixed. It wouldn't exactly give a good impression showing a customer a performance test result with diagrams that obviously contain wrong data (I fortunately checked this before such an incident). Due to that those small calculation issues make PerfMon quite dangerous to use at the time being which is sad cause it's very useful overall. Hoping for a fix in the near future. :)

Best regards, Peter

bugla avatar Jul 31 '20 16:07 bugla

Guys, here are some points from me:

  1. SIGAR is dead as project, as far as I know. There's nobody to go and talk to.
  2. I am occupied with other projects for now, and I have no time to invest into development of perfmon agent. That said, I am ready to help with reviewing PRs, merging and publishing new versions of perfmon agent.
  3. This is Open Source, anyone can go and talk to any library owner and ask for improvements, not only me.

undera avatar Aug 01 '20 06:08 undera