regex: memory bug when recursively matching a Regexp reference though an embeded code block with a subroutine
The bug occurs when recursively matching a Regexp reference though an embeded code block using a subroutine call both for the initial call and for the recursive call inside the embeded code block.
The people dealing with regex engine bugs will certainly laugh or curse at the insanity of the bug, this is not lost on me.
The malloc() bug and Segmentation fault resulting from running the script:
$ ./test.pl lines.txt
line 1
Segmentation fault (core dumped)
$ ./test.pl lines.txt
line 1
line 3
malloc(): unaligned tcache chunk detected
Aborted (core dumped)
The data file:
$ cat lines.txt
line 1
line 2
line 3
line 4
line 5
The golfed test script containing various test cases in order to better understand the context in which the bg occurs. Only 2 of those tests produce a Segmentation fault or a malloc() bug
#!/usr/bin/perl
sub match { $_[0] =~ $_[1] }
while (<>) {
print;
my $line = $_;
my $tried;
my $re;
# $re = qr{ (?{ match($line, $re) if !$tried++ }) }x; # ok
# $re = qr{ \G (?{ match($line, $re) if !$tried++ }) }x; # ok
# $re = qr{ (?{ match($line, $re) if !$tried++ }) (*FAIL) }x; # Segmentation fault
$re = qr{ \G (?{ match($line, $re) if !$tried++ }) (*FAIL) }x; # malloc(): unaligned tcache chunk detected
match($line, $re);
# all ok
# $re = qr{ (?{ $line =~ $re if !$tried++ }) }x;
# $re = qr{ \G (?{ $line =~ $re if !$tried++ }) }x;
# $re = qr{ (?{ $line =~ $re if !$tried++ }) (*FAIL) }x;
# $re = qr{ \G (?{ $line =~ $re if !$tried++ }) (*FAIL) }x;
# $line =~ $re;
# all ok
# $re = qr{ (?{ match($line, $re) if !$tried++ }) }x;
# $re = qr{ \G (?{ match($line, $re) if !$tried++ }) }x;
# $re = qr{ (?{ match($line, $re) if !$tried++ }) (*FAIL) }x;
# $re = qr{ \G (?{ match($line, $re) if !$tried++ }) (*FAIL) }x;
# $line =~ $re;
# all ok
# $re = qr{ (?{ $line =~ $re if !$tried++ }) }x;
# $re = qr{ \G (?{ $line =~ $re if !$tried++ }) }x;
# $re = qr{ (?{ $line =~ $re if !$tried++ }) (*FAIL) }x;
# $re = qr{ \G (?{ $line =~ $re if !$tried++ }) (*FAIL) }x;
# match($line, $re);
}
$ perl -V
Summary of my perl5 (revision 5 version 38 subversion 1) configuration:
Platform:
osname=linux
osvers=5.12.15-arch1-1
archname=x86_64-linux-thread-multi
uname='archlinux'
config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.38/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.38/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.38/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='cc'
ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
optimize='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto'
cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='13.2.1 20230801'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
libc=/lib/../lib/libc.so.6
so=so
useshrplib=true
libperl=libperl.so
gnulibc_version='2.38'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.38/core_perl/CORE'
cccdlflags='-fPIC'
lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -L/usr/local/lib -fstack-protector-strong'
Characteristics of this binary (from libperl):
Compile-time options:
HAS_LONG_DOUBLE
HAS_STRTOLD
HAS_TIMES
MULTIPLICITY
PERLIO_LAYERS
PERL_COPY_ON_WRITE
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_SIPHASH13
PERL_HASH_USE_SBOX32
PERL_MALLOC_WRAP
PERL_OP_PARENT
PERL_PRESERVE_IVUV
PERL_USE_SAFE_PUTENV
USE_64_BIT_ALL
USE_64_BIT_INT
USE_ITHREADS
USE_LARGE_FILES
USE_LOCALE
USE_LOCALE_COLLATE
USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC
USE_LOCALE_TIME
USE_PERLIO
USE_PERL_ATOF
USE_REENTRANT_API
USE_THREAD_SAFE_LOCALE
Built under linux
Compiled at Nov 27 2023 20:20:27
@INC:
/usr/lib/perl5/5.38/site_perl
/usr/share/perl5/site_perl
/usr/lib/perl5/5.38/vendor_perl
/usr/share/perl5/vendor_perl
/usr/lib/perl5/5.38/core_perl
/usr/share/perl5/core_perl
I forgot to mention that the bug occurs specifically in the presense of (*FAIL) (or (?!)) after the embeded code block.
The backtraces in GDB using a debug build.
case of:
$re = qr{ (?{ match($line, $re) if !$tried++ }) (*FAIL) }x; # Segmentation fault
match($line, $re);
line 1
Program received signal SIGSEGV, Segmentation fault.
S_regtry (reginfo=0x7fffffffdb30, startposp=0x7fffffffda48) at regexec.c:4386
4386 RXp_OFFSp(prog)[0].start = *startposp - reginfo->strbeg;
@(gdb) bt
#0 S_regtry (reginfo=0x7fffffffdb30, startposp=0x7fffffffda48) at regexec.c:4386
#1 0x000055555573f4b7 in Perl_regexec_flags (rx=0x5555559c35f8, stringarg=0x5555559d90a0 "line 1\n",
strend=0x5555559d90a7 "", strbeg=0x5555559d90a0 "line 1\n", minend=0, sv=0x5555559cc8b8, data=0x0, flags=97)
at regexec.c:4280
#2 0x0000555555687202 in Perl_pp_match () at pp_hot.c:3771
#3 0x0000555555759e59 in Perl_runops_standard () at run.c:41
#4 0x00005555555c964a in S_run_body (oldscope=1) at perl.c:2801
#5 0x00005555555c92cb in perl_run (my_perl=0x5555559a62a0) at perl.c:2716
#6 0x000055555559b131 in main (argc=2, argv=0x7fffffffdf98, env=0x7fffffffdfb0) at perlmain.c:127
@(gdb)
case of:
$re = qr{ \G (?{ match($line, $re) if !$tried++ }) (*FAIL) }x; # malloc(): unaligned tcache chunk detected
match($line, $re);
line 1
line 2
malloc(): unaligned tcache chunk detected
Program received signal SIGABRT, Aborted.
0x00007ffff7d2783c in ?? () from /usr/lib/libc.so.6
@(gdb) bt
#0 0x00007ffff7d2783c in ?? () from /usr/lib/libc.so.6
#1 0x00007ffff7cd7668 in raise () from /usr/lib/libc.so.6
#2 0x00007ffff7cbf4b8 in abort () from /usr/lib/libc.so.6
#3 0x00007ffff7cc0390 in ?? () from /usr/lib/libc.so.6
#4 0x00007ffff7d317b7 in ?? () from /usr/lib/libc.so.6
#5 0x00007ffff7d35f0c in malloc () from /usr/lib/libc.so.6
#6 0x00005555557be690 in Perl_safesysmalloc (size=24) at util.c:159
#7 0x0000555555712183 in Perl_reg_temp_copy (dsv=0x5555559cc440, ssv=0x5555559db228) at regcomp.c:13257
#8 0x0000555555686a13 in Perl_pp_qr () at pp_hot.c:3572
#9 0x0000555555759e59 in Perl_runops_standard () at run.c:41
#10 0x00005555555c964a in S_run_body (oldscope=1) at perl.c:2801
#11 0x00005555555c92cb in perl_run (my_perl=0x5555559a62a0) at perl.c:2716
#12 0x000055555559b131 in main (argc=2, argv=0x7fffffffdf98, env=0x7fffffffdfb0) at perlmain.c:127
@(gdb)
I though I'd already posted this, for the tcache case:
$ valgrind -q ./perl -Ilib ../21725.pl <../21725.txt
line 1
==636144== Invalid write of size 8
==636144== at 0x484A3B1: memmove (vg_replace_strmem.c:1382)
==636144== by 0x409FA3: Perl_regexec_flags (regexec.c:4337)
==636144== by 0x2EDCF2: Perl_pp_match (pp_hot.c:3771)
==636144== by 0x1DDE22: Perl_runops_debug (dump.c:2866)
==636144== by 0x19ACA7: S_run_body (perl.c:2801)
==636144== by 0x19A29D: perl_run (perl.c:2716)
==636144== by 0x151154: main (perlmain.c:127)
==636144== Address 0x4bbb020 is 0 bytes inside a block of size 24 free'd
==636144== at 0x484317B: free (vg_replace_malloc.c:872)
==636144== by 0x4FE28A: Perl_safesysfree (util.c:416)
==636144== by 0x3B291B: Perl_pregfree2 (regcomp.c:13148)
==636144== by 0x4769CA: Perl_sv_clear (sv.c:6779)
==636144== by 0x4799C4: Perl_sv_free2 (sv.c:7296)
==636144== by 0x3F19B7: Perl_SvREFCNT_dec (sv_inline.h:694)
==636144== by 0x42BE20: S_setup_eval_state (regexec.c:11267)
==636144== by 0x406E15: Perl_regexec_flags (regexec.c:3900)
==636144== by 0x2EDCF2: Perl_pp_match (pp_hot.c:3771)
==636144== by 0x1DDE22: Perl_runops_debug (dump.c:2866)
==636144== by 0x41E6E5: S_regmatch (regexec.c:8354)
==636144== by 0x40A236: S_regtry (regexec.c:4423)
==636144== Block was alloc'd at
==636144== at 0x48455EF: calloc (vg_replace_malloc.c:1328)
==636144== by 0x4FE338: Perl_safesyscalloc (util.c:473)
==636144== by 0x406EA4: Perl_regexec_flags (regexec.c:3916)
==636144== by 0x2EDCF2: Perl_pp_match (pp_hot.c:3771)
==636144== by 0x1DDE22: Perl_runops_debug (dump.c:2866)
==636144== by 0x19ACA7: S_run_body (perl.c:2801)
==636144== by 0x19A29D: perl_run (perl.c:2716)
==636144== by 0x151154: main (perlmain.c:127)
==636144==
line 2
line 3
line 4
line 5
which looks like the regexp has been accessed after being freed.
For the Segmentation fault case:
$ valgrind -q ./perl -Ilib ../21725.pl <../21725.txt
line 1
perl: inline.h:1121: Perl_ReANY: Assertion `isREGEXP(re)' failed.
Aborted
Debugging:
$ gdb --args ./perl -Ilib ../21725.pl ../21725.txt
...
perl: inline.h:1121: Perl_ReANY: Assertion `isREGEXP(re)' failed.
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
...
#5 0x00007ffff7cdae32 in __GI___assert_fail (
assertion=0x555555b54c8a "isREGEXP(re)", file=0x555555b54ade "inline.h",
line=1121, function=0x555555b74438 <__PRETTY_FUNCTION__.93> "Perl_ReANY")
at ./assert/assert.c:101
#6 0x000055555583a8e4 in Perl_ReANY (re=0x555555c46ec8)
at /home/tony/dev/perl/git/perl6/inline.h:1121
...
(gdb) up 6
#6 0x000055555583a8e4 in Perl_ReANY (re=0x555555c46ec8)
at /home/tony/dev/perl/git/perl6/inline.h:1121
1121 assert(isREGEXP(re));
(gdb) call Perl_sv_dump(re)
SV = PVAV(0x555555c2e2c8) at 0x555555c46ec8
REFCNT = 1
FLAGS = ()
ARRAY = 0x555555c43f20
FILL = 0
MAX = 31
FLAGS = ()
which again looks like the REGEXP SV has been released and re-used for an array.
@tonycoz The infamous use after free, in this case because of a free occuring earlier than it should be. I've made a bit of progress on the same case: $re = qr{ \G (?{ match($line, $re) if !$tried++ }) (*FAIL) }x; # malloc(): unaligned tcache chunk detected.
$ gdb --args ./perl -Ilib path/to/test_segv.pl
b Perl_regexec_flags
b S_setup_eval_state
b Perl_pp_regcomp
r
next until reaching:
Breakpoint 48, Perl_pp_regcomp () at pp_ctl.c:90
do next until:
157 tmp = reg_temp_copy(NULL, new_re);
@(gdb)
158 ReREFCNT_dec(new_re);
@(gdb)
159 new_re = tmp;
then do:
awatch -l new_re
awatch -l new_re->sv_refcnt
watch -l PL_op
commands
print PL_op->op_ppaddr
end
careful with the commands command and its definition
@(gdb) awatch -l new_re
Hardware access (read/write) watchpoint 51: -location new_re
@(gdb) awatch -l new_re->sv_refcnt
Hardware access (read/write) watchpoint 52: -location new_re->sv_refcnt
@(gdb) watch -l PL_op
Hardware watchpoint 53: -location PL_op
@(gdb) commands
Type commands for breakpoint(s) 53, one per line.
End with a line saying just "end".
@>print PL_op->op_ppaddr
@>end
@(gdb)
continue until reaching Perl_pp_match for the 2nd time
@(gdb)
Continuing.
Hardware watchpoint 53: -location PL_op
Old value = (OP *) 0x555555c5d368
New value = (OP *) 0x555555c5d3a8
0x0000555555629dc5 in Perl_runops_debug () at dump.c:2866
2866 } while ((PL_op = PL_op->op_ppaddr(aTHX)));
$253 = (OP *(*)(void)) 0x555555739109 <Perl_pp_match>
@(gdb) c
Continuing.
Breakpoint 49, Perl_regexec_flags (rx=0x555555c47698, stringarg=0x555555c5d490 "line 1\n",
strend=0x555555c5d497 "", strbeg=0x555555c5d490 "line 1\n", minend=0, sv=0x555555c50928, data=0x0, flags=97)
at regexec.c:3673
3673 {
@(gdb) c
Continuing.
Breakpoint 67, S_setup_eval_state (reginfo=0x7fffffffce60) at regexec.c:11223
11223 regexp *const rex = ReANY(reginfo->prog);
this function contains the macro SET_reg_curpm(reginfo->prog); which is responsible for freeing the SV before its time.
/* Set which rex is pointed to by PL_reg_curpm, handling ref counting.
* Do inc before dec, in case old and new rex are the same */
#define SET_reg_curpm(Re2) \
if (reginfo->info_aux_eval) { \
(void)ReREFCNT_inc(Re2); \
ReREFCNT_dec(PM_GETRE(PL_reg_curpm)); \
PM_SETRE((PL_reg_curpm), (Re2)); \
}
The SIGABRT then occurs after pp_leavesub, when the 2nd call to match($line, $re) returns.
@(gdb)
Continuing.
Hardware access (read/write) watchpoint 8: -location new_re->sv_refcnt
Value = 1
0x000055555583d579 in Perl_SvREFCNT_dec (sv=0x555555c47728)
at /home/user/perl/PERL_SOURCE/perl5_debug3/sv_inline.h:690
690 U32 rc = SvREFCNT(sv);
@(gdb)
Continuing.
Hardware access (read/write) watchpoint 8: -location new_re->sv_refcnt
Old value = 1
New value = 0
Perl_sv_free2 (sv=0x555555c47728, rc=1) at sv.c:7284
7284 if (SvTEMP(sv)) {
@(gdb)
Continuing.
Hardware access (read/write) watchpoint 8: -location new_re->sv_refcnt
Value = 0
0x00005555558c204d in Perl_sv_clear (orig_sv=0x555555c47728) at sv.c:6691
6691 assert(SvREFCNT(sv) == 0);
@(gdb)
I understand only a tiny bit of what is going on but maybe the issue it is related to this warning in S_regmatch() at case EVAL: :
/* *** Note that at this point we don't restore
* PL_comppad, (or pop the CxSUB) on the assumption it may
* be used again soon. This is safe as long as nothing
* in the regexp code uses the pad ! */
PL_op = oop;
PL_curcop = ocurcop;
regcp_restore(rex, ST.lastcp, &maxopenparen);
PL_curpm_under = PL_curpm;
PL_curpm = PL_reg_curpm;
Can someone put the type-regex label please? I can't do it myself.
I did additional research back in December but didn't finished writing the post, nor the research.
The previous post is wrong. SET_reg_curpm(reginfo->prog); free the regex too early but the issue happens before that.
I think there is an extra reference decrement (that shouldn't happened) that provoke the early freeing of the SV is in the function pp_regcomp() in pp_ctl.c, specifically, the 2nd time that pp_regcomp() is called which corresponds to the 2nd time the perl subroutine match() is called in the my example above. The second time when match() is called inside the embeded code block, the second time per while (<>) loop iteration.
if (re != new_re) {
ReREFCNT_dec(re);
PM_SETRE(pm, new_re);
}
The details are fuzzy now but commenting out ReREFCNT_dec(re) did have an effect.
minperl is the patched perl with the line ReREFCNT_dec(re) commented out.
TEST1
$ perl -E 'sub match { $_[0] =~ $_[1] } my $str=shift; my $seen; my $re; $re=qr/^(?{ match($str, $re) if !$seen++; say "code block: $1" })(..)/; $str =~ $re; say "main: $1"' abcde
code block:
code block: ab
main: ab
As expected:
$ ./miniperl -Ilib -E 'sub match { $_[0] =~ $_[1] } my $str=shift; my $seen; my $re; $re=qr/^(?{ match($str, $re) if !$seen++; say "code block: $1" })(..)/; $str =~ $re; say "main: $1"' abcde
code block:
code block: ab
main: ab
TEST2
regular perl:
$ perl -E 'sub match { $_[0] =~ $_[1] } my $str=shift; my $seen; my $re; $re=qr/^(?{ match($str, $re) if !$seen++; say "code block: $1" })(..)/; match($str, $re); say "main: $1"' abcde
code block:
code block: ab
Segmentation fault (core dumped)
perl debug build:
$ perld -E 'sub match { $_[0] =~ $_[1] } my $str=shift; my $seen; my $re; $re=qr/^(?{ match($str, $re) if !$seen++; say "code block: $1" })(..)/; match($str, $re); say "main: $1"' abcde
code block:
code block: ab
perl: inline.h:1121: Perl_ReANY: Assertion `isREGEXP(re)' failed.
/usr/bin/perld: line 1: 85192 Aborted (core dumped)
minperl: it doesn't crash but $1 in undefined while it should be set to "ab".
$ ./miniperl -Ilib -E 'sub match { $_[0] =~ $_[1] } my $str=shift; my $seen; my $re; $re=qr/^(?{ match($str, $re) if !$seen++; say "code block: $1" })(..)/; match($str, $re); say "main: $1"' abcde
code block:
code block: ab
main:
This suggests that the function/macro ReREFCNT_dec(re) shouldn't be called, which means that re != new_re should be false. Now, I'm sorry if this is imprecise and fuzzy but my idea 5 months ago was that new_re = Perl_re_op_compile(),
(really :)
new_re = (eng->op_comp
? eng->op_comp
: &Perl_re_op_compile
)(aTHX_ args, nargs, pm->op_code_list, eng, re,
&is_bare_re,
(pm->op_pmflags & RXf_PMf_FLAGCOPYMASK),
pm->op_pmflags |
(PL_op->op_flags & OPf_SPECIAL ? PMf_USE_RE_EVAL : 0));
called in PP(pp_regcomp) (pp_ctl.c) should have returned the original re because there is no need to recompile it. This is turned would be related to the line pat = S_concat_pat() in Perl_re_op_compile (regcomp.c).
I think that maybe pp_regcomp assumed that because re != new_re it was free to free the old regex but didn't account for the case where there would be some structure sharing somewhere in the regex SV.