perl5
perl5 copied to clipboard
SV Arenas have duplicate sized pool slots
Description
SV Body arena roots are duplicative and redundant. SVt_PVNV and SVt_PVHV are identical. SVt_INVLIST, SVt_PVAV, SVt_PVOBJ are identical. SVt_PVMG and SVt_PVGV are identical. SVt_PVCV and SVt_PVFM are identical.
SVt_PVFM is marked "NOARENA" yet would fit exactly into SVt_PVCV's pool.
Why arent these free memory pools sorted and deduped by size? It would make some room for struct MG and struct GP to be pool allocated instead of malloc()ed, and remove the 2 pointer sized secret header cost of each malloc allocation block.
first number on the left is size of the body struct in bytes, on a 64b CPU.
body_details <0, 0, 0, 40h, 0> SVt_NULL
body_details <0, 8, 20h, 41h, 0> SVt_IV
body_details <0, 8, 28h, 2, 0> SVt_NV
body_details <10h, 10h, 10h, 0C3h, 0DD0h> SVt_PV
body_details <28h, 21h, 10h, 0E4h, 0C30h> SVt_INVLIST
body_details <18h, 18h, 10h, 0C5h, 0D60h> SVt_PVIV
body_details <20h, 20h, 10h, 86h, 0CE0h> SVt_PVNV
body_details <30h, 30h, 0, 87h, 0FF0h> SVt_PVMG
body_details <0E0h, 0E0h, 0, 0E8h, 0FC0h> SVt_REGEXP
body_details <30h, 30h, 0, 0A9h, 0FF0h> SVt_PVGV
body_details <50h, 50h, 0, 0AAh, 0FF0h> SVt_PVLV
body_details <28h, 28h, 0, 0EBh, 0FF0h> SVt_PVAV
body_details <20h, 20h, 0, 0ECh, 0FE0h>SVt_PVHV
body_details <68h, 68h, 0, 0EDh, 0FD8h> SVt_PVCV
body_details <68h, 68h, 0, 6Eh, 820h> SVt_PVFM
body_details <88h, 88h, 0, 0EFh, 0CC0h> SVt_PVIO
body_details <28h, 28h, 0, 0F0h, 0FF0h>SVt_PVOBJ
Steps to Reproduce
C debugger, look at array PL_body_roots. Look at body_details struct in sv_inline.h.
Expected behavior
A smaller PL_body_roots array. More memory returned to OS after heavy subs, or less peak memory usage, since arena pools have less empty slots in them towards their ends. More common perl core fixed length, or really ALL core fixed length structs come from pool allocators, not malloc. Remember each pool chuck is a unit of 0x1000 or 4096 bytes, minus fixed 10-100 bytes.
Perl configuration
C:\sources\perl5>perl -V
Summary of my perl5 (revision 5 version 41 subversion 5) configuration:
Derived from: 344512f62ca15ae427a1e05bab2887337bd534ef
Platform:
osname=MSWin32
osvers=6.1.7601
archname=MSWin32-x64-multi-thread
uname=''
config_args='undef'
hint=recommended
useposix=true
d_sigaction=undef
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=undef
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='cl'
ccflags ='-nologo -GF -W3 -MD -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -D_CRT_
SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WINSOCK_DEPRECATED_NO_WARNING
S -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -DUSE_PERLIO'
optimize='-O1 -Zi -GL -fp:precise'
cppflags='-DWIN32'
ccversion='19.36.32535'
gccversion=''
gccosandvers=''
intsize=4
longsize=4
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=undef
longlongsize=8
d_longdbl=define
longdblsize=8
longdblkind=0
ivtype='__int64'
ivsize=8
nvtype='double'
nvsize=8
Off_t='__int64'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='link'
ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\perl\
lib\CORE" -machine:AMD64 -subsystem:console,"5.02"'
libpth="C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSV
C\14.36.32532\\lib\x64"
libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.li
b advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.l
ib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
vcruntime.lib ucrt.lib
perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg3
2.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt
.lib vcruntime.lib ucrt.lib
libc=ucrt.lib
so=dll
useshrplib=true
libperl=perl541.lib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs
dlext=dll
d_dlsymun=undef
ccdlflags=' '
cccdlflags=' '
lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:
\perl\lib\CORE" -machine:AMD64 -subsystem:console,"5.02"'
Characteristics of this binary (from libperl):
Compile-time options:
HAS_LONG_DOUBLE
HAS_TIMES
HAVE_INTERP_INTERN
MULTIPLICITY
PERLIO_LAYERS
PERL_COPY_ON_WRITE
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_SIPHASH13
PERL_HASH_USE_SBOX32
PERL_IMPLICIT_SYS
PERL_MALLOC_WRAP
PERL_OP_PARENT
PERL_PRESERVE_IVUV
PERL_USE_SAFE_PUTENV
USE_64_BIT_INT
USE_ITHREADS
USE_LARGE_FILES
USE_LOCALE
USE_LOCALE_COLLATE
USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC
USE_LOCALE_TIME
USE_PERLIO
USE_PERL_ATOF
USE_THREAD_SAFE_LOCALE
Locally applied patches:
uncommitted-changes
Built under MSWin32
Compiled at Oct 14 2024 04:40:08
@INC:
C:/sources/perl5/lib
C:\sources\perl5>
Why arent these free memory pools sorted and deduped by size?
It's been in the middle of my ideas list but has never made it to the top. I do think it's worth doing this, and possibly looking to see if increasing the pool size from 4k to e.g. 8k would give less wastage.
I'm also curious to see if all bodies were allocated from arenas, whether compilers would pick up on that and automatically optimise away the existing "return to arena or Safefree" branches.
Why arent these free memory pools sorted and deduped by size?
It's been in the middle of my ideas list but has never made it to the top. I do think it's worth doing this, and possibly looking to see if increasing the pool size from 4k to e.g. 8k would give less wastage.
The generate_uudmap.exe cleanup branch, one reason for it was, I wanted a test,
char * p2;
char * p = malloc(1);
p2 = realloc(p,2);
if(p2 !=p)
write_define(2);
p = p2;
malloc(2);
p2 = realloc(p,3);
if(p2 !=p)
write_define(3);
p = p2;
malloc(3);
p2 = realloc(p,4);
if(p2 !=p)
write_define(4);
p = p2;
and learn the actual boundaries of the OS/libc/vendor malloc, vs P5's current amateur guesses derived from the generic 4096 x86 page. Its impl specific ub, where malloc() keeps its book keeping. The traditional 2 pointers right before your ptr??? does the malloc steal 1-7 bytes below "power of 2" at the end of your alloc??? Was the OS designer bold and daring, and there IS NO HEADER, that malloc uses a red black tree????
P5 core also has 2 or 3 different malloc on malloc systems right now one is Win32 specific threads specific, other is -DDEBUGGING specific, and 3rd is P5 Configure decides OS malloc is garbage and totally replaces it. While there is some attempt at doing all the math, to correctly subtract P5 malloc wrapper headers vs build options vs our #define GOODSIZE 4096, the offsets and constants were picked decades ago, and there is no CI code to test if all those guesses and constants are correct. Its very rare a core dev, will use a C debugger and step into the OS malloc code, or use OS VM analystic tools.
A single +1 or -1 mistake in our math for GOODSIZE can perm waste 1-15 or 1-31 bytes over and over.