sysstat icon indicating copy to clipboard operation
sysstat copied to clipboard

sar -B produces incorrect vmeff% in sysstat-11.7.3-7.el8.x86_64

Open gleventhal opened this issue 3 years ago • 6 comments

sysstat-11.7.3-7.el8.x86_64 It should be pgsteal / pgscan but it seems it's now: pgsteal / pgscan * 100

06:01:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
06:02:11 PM      0.00     95.00   6422.00      0.00   9790.00      0.00    141.00    274.00    194.33
06:02:21 PM      0.00      0.00   2608.00      0.00   2889.00      0.00     65.00    130.00    200.00

gleventhal avatar Oct 19 '22 18:10 gleventhal

@sysstat I've done some analysis regarding this issue and it may affect any system with a reasonably recent kernel.

The method to compute %vmeff is correct because the corresponding source code did not change between the versions of sysstat in RHEL 7, 8 or even 9.

What has, however, changed are the contents of /proc/vmstat which are used to compute the values of pgscan columns. All recent versions of sysstat parse the /proc/vmstat file and sum all values with pgscan_direct and pgscan_kswapd prefixes which then correspond to the pgscank and pgscand columns produced by sar.

See the following experiment for a list of pgscan fields offered by given kernel versions in /proc/vmstat:

  • RHEL 7 (kernel-3.10.0-1160.95.1.el7.x86_64):
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd_dma
pgscan_kswapd_dma32
pgscan_kswapd_normal
pgscan_kswapd_movable
pgscan_direct_dma
pgscan_direct_dma32
pgscan_direct_normal
pgscan_direct_movable
pgscan_direct_throttle
  • RHEL 8 (kernel-4.18.0-506.el8.x86_64):
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_direct_throttle
pgscan_anon
pgscan_file
  • RHEL 9 (kernel-5.14.0-347.el9.x86_64)
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_direct_throttle
pgscan_anon
pgscan_file
  • Fedora 38 (kernel-6.4.6-200.fc38.x86_64)
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_khugepaged
pgscan_direct_throttle
pgscan_anon
pgscan_file

The pgscan_anon, pgscan_file and pgscan_khugepaged fields on newer kernels are ignored by sysstat which is the reason why the number of stolen pages may be higher than the number of scanned pages. Thus, sar may produce %vmeff values that are not correct.

edit: typo

lzaoral avatar Aug 07 '23 10:08 lzaoral

@lzaoral Thanks for your analysis. The solution is then probably to sum all fields from /proc/vmstat starting with pgscan_. If new fields using this prefix are added in the future, they will be taken into account.

sysstat avatar Aug 17 '23 08:08 sysstat

I'm also wondering whether %vmeff should still be displayed by sar or not. This is more a kernel metric than a system one and as such, it should probably be discarded...?

sysstat avatar Aug 17 '23 08:08 sysstat

I find %vmeff a handy proxy to see if there is memory pressure.

petervanhooft avatar Aug 30 '23 06:08 petervanhooft

I also find it a useful metric when the math makes sense. I don't see a valuable distinction between kernel and system metrics, that's a fungible thing from where I stand.

gleventhal avatar Mar 28 '24 14:03 gleventhal

Can't we just test for the behavior and conditionally do the correct thing to get the expected value? I know it's a bit of a shim, but I'd much prefer an if statement or 2 over losing a stat that I value for troubleshooting vm issues.

gleventhal avatar Mar 28 '24 14:03 gleventhal