perl-proc-processtable icon indicating copy to clipboard operation
perl-proc-processtable copied to clipboard

Incorrect calculation on Mac of process memory percentage causes buffer overflow and illegal instruction error

Open jkahrman opened this issue 1 year ago • 2 comments

On Mac, Proc-ProcessTable is overflowing the "pctmem" variable in os/darwin.c by miscalculating a negative percentage of memory use.

A lightly edited co-worker's analysis of the problem and proposed fix:

The MacOS crash report shows an overflow condition:

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000001, 0x0000000181117e60

Termination Reason:    Namespace SIGNAL, Code 0x5
Terminating Process:   exc handler [52903]

Application Specific Information:
detected buffer overflow

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_c.dylib                 0x0000000181117e60 __chk_fail_overflow + 24
1   libsystem_c.dylib                 0x00000001810aee00 __sprintf_chk + 112
2   ProcessTable.bundle               0x0000000102d8e0d8 OS_get_table + 1296
3   ProcessTable.bundle               0x0000000102d8f374 XS_Proc__ProcessTable_table + 392
4   perl                              0x0000000102614990 Perl_pp_entersub + 2352
5   perl                              0x000000010260a4f4 Perl_runops_standard + 32
6   perl                              0x00000001025785c8 S_run_body + 116
7   perl                              0x000000010257848c perl_run + 448
8   perl                              0x000000010254e6dc main + 180
9   dyld                              0x0000000180eb3f28 start + 2236

The crash address is performing the:

sprintf (pctmem, "%.1f", ((float) ki->tasks_info.resident_size)
                  * 100 / mempages);

calculation. Its the only place in the code that passes the string literal, "%.1f":

00000000000020b4        1e211800        fdiv    s0, s0, s1
00000000000020b8        1e22c000        fcvt    d0, s0
00000000000020bc        fd0003e0        str     d0, [sp]
00000000000020c0        d102f3a0        sub     x0, x29, #0xbc
00000000000020c4        52800001        mov     w1, #0x0
00000000000020c8        528000c2        mov     w2, #0x6
00000000000020cc        7000dfa3        adr     x3, #0x1bf7 ; literal pool for: "%.1f"
00000000000020d0        d503201f        nop
00000000000020d4        940005f2        bl      0x389c ; symbol stub for: ___sprintf_chk
00000000000020d8        a9402688        ldp     x8, x9, [x20]
00000000000020dc        d34afd08        lsr     x8, x8, #10
00000000000020e0        d34afd29        lsr     x9, x9, #10

After using bits of the .c code to put together a standalone test program, a loop of increasing (fake) resident_size numbers produce:

% ./test
ERRONEOUS:mempages:-644104192 3650863104 D99BC000
repro:mempages:-644104192 D99BC000
RS:643778560 * 100 (64377856000) / -644104192 ->-99.9
RS:643779584 * 100 (64377958400) / -644104192 ->-99.9
RS:643780608 * 100 (64378060800) / -644104192 ->-99.9
RS:643781632 * 100 (64378163200) / -644104192 ->-99.9
RS:643782656 * 100 (64378265600) / -644104192 ->-100.0
Trace/BPT trap

The problem seems rooted in its use of the HW_PHYSMEM to obtain mempages:

        oldlen = sizeof(mempages);
        mib[0] = CTL_HW;
        mib[1] = HW_PHYSMEM;
        sysctl(mib, 2, &mempages, &oldlen, NULL, 0);

mempages is defined as an 'int' (i.e. a signed 32bit number).

The machine I'm using has ~34 gig of memory so int isn't going to hold an accurate number. The amount of memory used by the process is not a problem. It uses the mac's "task" interface to obtain a number which is 64-bits by definition.

The problem is the number used to calculate the fraction of the process's memory usage...against overall user-space memory available.

See OS_initialize here: https://github.com/jwbargsten/perl-proc-processtable/blob/94674d9a78c3543f08b794c65b7401496c61f8d1/os/darwin.c#L70

In particular: mib[1] = HW_PHYSMEM;

First, the value returned is in bytes...not pages, not K. It is returned is a 4-byte signed integer which can hold no positive number above 2GB (0x7FFFFFFF).

BSD addressed this years ago by adding a query value of HW_PHYSMEM64 to return values above the 2GB limit -- I searched the MacOS 13.3 XNU source code and can't find any reference to this.

Also note the code names/defines the variable used to receive the number in this case as: int mempages = 0;

This is confusing/misleading because it is not in pages and there is no treatment in the code indicating otherwise.

Next, this is just as broken on Intel macs as ARM macs. The difference is that (based on observation), the Intel mac returns the number 0x8000000, or -2147483648 (a signed integer) for both a 16GB and 64GB hosts where the ARM macs return different smaller negative numbers on different machines.

For instance: On maca64 with 32GB RAM: 0xD99BC000/-644104192
On maca64:with 16GB RAM: 0xDF8B4000/-544522240 On maca64:with 8GB RAM: 0xDF8E0000/-544342016

I could find no documented explanation for these numbers.

The problem with the calculation is that it derives from dividing a positive number by a negative number. This sets up for the possibility that if a process is using more memory (as a positive number), than the same value (as a negative number), it results in a negative percentage greater than or equal to -100.0 percent. The NULL byte terminating the string, overflows the six-character buffer defined in the code: char pctmem[6]; causing the exception we see.

Thus, I propose this "workaround" fix:

1: change the definition of "mempages" to: uint64_t mempages = 0;

2: change the sysctl number requested to be: mib[1] = HW_MEMSIZE;

The problem using HW_MEMSIZE is that the number represents all memory (both kernel and user) whereas the original HW_PHYSMEM attempted to represent user memory only (not kernel). I couldn't find a similar easy way to obtain just the user memory number. I believe "top" and "ps" do it...so I believe it is possible. However, given the totally erroneous calculation as it is, it's likely no one is using the code to obtain accurate process memory percentages.

jkahrman avatar Dec 25 '23 02:12 jkahrman

Just returned from vacation. Awesome work! I'll also have a look by myself, see if we can make it as backwards compatible as possible. Regardless of this I'll already prep a branch and PR upcoming week.

jwbargsten avatar Jan 06 '24 12:01 jwbargsten