node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Node exporter reporting data for too many CPUs on FreeBSD

Open davehayes opened this issue 3 years ago • 2 comments

Host operating system: output of uname -a

FreeBSD 12.2-STABLE r368820 amd64

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.1 (branch: release-1.0, revision: 0) build user: root build date:
go version: go1.15.6

node_exporter command line flags

--collector.textfile.directory=/some/where --collector.devstat --collector.ntp

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

So hw.ncpu is 16, that's 16 cores. machdep.hyperthreading_allowed: 0 is also set. This is a Ryzen 3950X.

node_cpu_seconds_total{ cpu="30", mode="idle" } ... this value is 0. According to our discussion in matrix, that's a bug.

It turns out that kern.cp_times is likely the culprit as it has a bunch of 0s appended here:

# sysctl kern.cp_times
kern.cp_times: 119169 344788 336514 125390 896317803 115801 274757 654468 69366 896129272 53798 361436 309501 74719 896444186 154879 386182 369249 78511 896254839 170832 359904 397446 2341 896313117 178544 288452 480246 2527 896293871 178332 351178 408500 3398 896302256 189411 403573 425310 2473 896222895 158723 367277 508940 2426 896206286 106010 304752 473477 2622 896356800 137008 400359 367762 2198 896336334 172556 412512 416235 2464 896239897 187877 375365 424755 2333 896253331 171498 308979 409341 2573 896351273 184604 406386 432323 2395 896217932 196308 415485 487813 2333 896141701 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Here's the dmesg information on the CPU I have:

CPU: AMD Ryzen 9 3950X 16-Core Processor             (3493.52-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x870f10  Family=0x17  Model=0x71  Stepping=0
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,<b30>>
  Structured Extended Features=0x219c91a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA>
  Structured Extended Features2=0x400004<UMIP,RDPID>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x10cb657<CLZERO,IRPerf,XSaveErPtr>
  SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics

What did you expect to see?

I expect to see one cpu label in node_cpu_seconds_total per actual CPU, with no cpu label greater than the value of hw.ncpu.

What did you see instead?

Let C be the value of hw.ncpu. I saw node_cpu_sections_total with labels from C to 2C by 1, each with n actual value of 0.

davehayes avatar Apr 22 '21 21:04 davehayes

Thanks for the detailed issue. It'd be nice to get some more info from FreeBSD experts on this one.

SuperQ avatar Apr 22 '21 22:04 SuperQ

I've been searching around various mailing lists. It seems there's information I didn't see before that might help in this case. This information is from the machine that had the bug:

# sysctl -d kern.smp
kern.smp: Kernel SMP
kern.smp.forward_signal_enabled: Forwarding of a signal to a process on a different CPU
kern.smp.topology: Topology override setting; 0 is default provided by hardware.
kern.smp.cores: Number of physical cores online
kern.smp.threads_per_core: Number of SMT threads online per core
kern.smp.cpus: Number of CPUs online
kern.smp.disabled: SMP has been disabled from the loader
kern.smp.active: Indicates system is running in SMP mode
kern.smp.maxcpus: Max number of CPUs that the system was compiled for.
kern.smp.maxid: Max CPU ID.
# sysctl kern.smp
kern.smp.forward_signal_enabled: 1
kern.smp.topology: 0
kern.smp.cores: 16
kern.smp.threads_per_core: 1
kern.smp.cpus: 16
kern.smp.disabled: 0
kern.smp.active: 1
kern.smp.maxcpus: 256
kern.smp.maxid: 31

I think this section of sysctl MIB will tell you all you need to know. I suggest using kern.smp.cores to limit kern.cp_times myself.

davehayes avatar Apr 22 '21 22:04 davehayes