perl5
perl5 copied to clipboard
regular expressions matching lines read from an in-memory scalar is extremely slow in cygwin/MSYS2 perls
Description With cygwin/MSYS2 perl when you do regular expression matching on lines read from an in-memory scalar and the regex matches something it takes orders of magnitude longer than matching the same lines read from a disk file.
Steps to Reproduce Below is code to demonstrate the behavior. An example file is attached to run with the code. In this file about 16% of the lines match.
#!/usr/bin/env perl
use warnings;
use strict;
use Time::HiRes qw( time );
my $file = shift @ARGV;
my ($fh, $time, $n);
open $fh, "<", $file;
$time = time;
$n = 0;
while(<$fh>) {
/^ ?Query/;
$n++;
}
printf "%f read lines from disk and do RE; n=$n.\n", time - $time;
seek $fh, 0, 0;
my $s = "";
while(<$fh>) {
$s .= $_;
}
# my $s = do {
# local $/;
# <$fh>;
# };
open $fh, "<", \$s;
$time = time;
$n = 0;
while(<$fh>) {
/^ ?Query/;
$n++;
}
printf "%f read lines from in-memory file and do RE; n=$n.\n", time - $time;
On my cygwin system this prints:
0.122725 read lines from disk and do RE; n=570694.
27.238712 read lines from in-memory file and do RE; n=570694.
So the in-memory file is about 300 times slower.
Expected behavior I would expect the times to be roughly in the same ball park.
Perl configuration
Summary of my perl5 (revision 5 version 36 subversion 3) configuration:
Platform:
osname=cygwin
osvers=3.4.10-1.x86_64
archname=x86_64-cygwin-threads-multi
uname='cygwin_nt-10.0-22631 walter 3.4.10-1.x86_64 2023-11-29 12:12 utc x86_64 cygwin '
config_args='-des -Dprefix=/usr -Dmksymlinks -Darchname=x86_64-cygwin-threads -Dlibperl=cygperl5_36.dll -Dcc=gcc -Dld=g++ -Accflags=-ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='gcc'
ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
optimize='-O3'
cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
ccversion=''
gccversion='11.4.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='g++'
ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'
libpth=/usr/lib
libs=-lpthread -lnsl -lgdbm -ldb -ldl -lcrypt -lgdbm_compat
perllibs=-lpthread -lnsl -ldl -lcrypt
libc=/usr/lib/libcygwin.a
so=dll
useshrplib=true
libperl=cygperl5_36.dll
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=dll
d_dlsymun=undef
ccdlflags=' '
cccdlflags=' '
lddlflags=' --shared -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'
Characteristics of this binary (from libperl):
Compile-time options:
HAS_TIMES
MULTIPLICITY
PERLIO_LAYERS
PERL_COPY_ON_WRITE
PERL_DONT_CREATE_GVSV
PERL_OP_PARENT
PERL_PRESERVE_IVUV
PERL_USE_SAFE_PUTENV
USE_64_BIT_ALL
USE_64_BIT_INT
USE_ITHREADS
USE_LARGE_FILES
USE_LOCALE
USE_LOCALE_COLLATE
USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC
USE_LOCALE_TIME
USE_PERLIO
USE_PERL_ATOF
USE_REENTRANT_API
USE_THREAD_SAFE_LOCALE
Locally applied patches:
Cygwin: README
Cygwin: use auto-image-base instead of fixed DLL base address
Cygwin: modify hints
Cygwin: Configure correct libsearch
Cygwin: Configure correct libpth
Cygwin: Win32 correct UTF8 handling
Built under cygwin
Compiled at Nov 30 2023 21:40:29
%ENV:
PERL5LIB="/home/dwrice/perl"
CYGWIN="winsymlinks:nativestrict"
@INC:
/home/dwrice/perl
/usr/local/lib/perl5/site_perl/5.36/x86_64-cygwin-threads
/usr/local/share/perl5/site_perl/5.36
/usr/lib/perl5/vendor_perl/5.36/x86_64-cygwin-threads
/usr/share/perl5/vendor_perl/5.36
/usr/lib/perl5/5.36/x86_64-cygwin-threads
/usr/share/perl5/5.36
Related PerlMonks discussion: https://www.perlmonks.org/?node_id=11157166
@dxter-zz Any chance you could modify your script and rerun it to separately measure the time taken for:
seek $fh, 0, 0;
my $s = "";
while(<$fh>) {
$s .= $_;
}
and
while(<$fh>) {
/^ ?Query/;
$n++;
}
Right now, the script's timing output conflates the two operations and its not clear how time taken is distributed across the latter two while loops.
e.g. Could this be a memory allocation/copy slowdown, rather than something in the regex engine?
I'm not sure I get your meaning, but populating $s takes 0.14 seconds on my machine.
0.125541 read lines from disk and do RE; n=570694.
0.139314 time to populate $s.
27.575185 read lines from in-memory file and do RE; n=570694.
I suspect part of the problem is the SvLEN() for $_
in the read from memory case is typically very large, if I time each match attempt and sort them, the matches that take the longest time have very large SvLEN() values, though I''m not sure where that's causing the problem.
The long SvLEN() values appear to be from this code in sv_gets():
/* make sure we have the room */
if ((I32)(SvLEN(sv) - append) <= cnt + 1) {
/* Not room for all of it
if we are looking for a separator and room for some
*/
if (rslen && cnt > 80 && (I32)SvLEN(sv) > append) {
/* just process what we have room for */
shortbuffered = cnt - SvLEN(sv) + append + 1;
cnt -= shortbuffered;
}
else {
/* ensure that the target sv has enough room to hold
* the rest of the read-ahead buffer */
shortbuffered = 0;
/* remember that cnt can be negative */
SvGROW(sv, (STRLEN)(append + (cnt <= 0 ? 2 : (cnt + 1))));
}
}
else {
/* we have enough room to hold the full buffer, lets scream */
shortbuffered = 0;
}
Profiling shows sv_setsv_cow() as the hot function, but if I do line-by-line profiling the two hottest lines appear to be:
CowREFCNT(ssv) = 0; // hot
}
# ifdef PERL_DEBUG_READONLY_COW
if (already) sv_buf_to_rw(ssv);
# endif
CowREFCNT(ssv)++; // hot
which doesn't make much sense, CowREFCNT(sv) is pretty much *(sv->sv_u.svu_pv + ((XSVPV*)(sv->sv_any)->xpv_len))
and doesn't do anything I'd expect to be time costly.
But I don't expect sv_setsv_cow() to be expensive in any case.
That CowREFCNT() macro is one of the few places referencing SvLEN() (aka ((XSVPV*)(sv->sv_any)->xpv_len)
) in the code path I expect here.
All I can think of is Windows deciding to commit (probably filling with zeroes) all of the large allocation done earlier by sv_gets() when that last byte is accessed.
In case it wasn't clear and if it's helpful, the large time difference only seems to occur when there are lines in the file that match the regex.
In case it wasn't clear and if it's helpful, the large time difference only seems to occur when there are lines in the file that match the regex.
It was clear and useful: the regexp engine (conditionally) calls sv_setsv_cow() on a successful match to save the string for later use in reporting matched text, eg with $&
, $1
etc.
At this point I see three issues:
- short strings with a large buffer are being COWed, even without reading from an in-memory file the SV buffer is being allocated to the size of the PerlIO buffer, or 8192 bytes for a normal file, or the size of the source SV when reading from an SV, or ~35MB in this case. By COWing this SV that memory can't be used for anything else until all COW references have been destroyed. This strikes me as wasteful and isn't cygwin specific.
- sv_gets() pre-allocates the SV buffer to match the remaining bytes in the PerlIO buffer, this is a (mostly) reasonable performance optimization when it's just the normal file buffer (8192 bytes), but it's excessive when the buffer is huge, as with the large in-memory file here. Again, not cygwin specific.
- whatever strange thing is happening with assignments to
CowREFCNT()
on cygwin.
I don't know that we can do anything about 3., the generated assembler seems reasonable:
CowREFCNT(ssv) = 0;
d92b: 48 8b 06 mov (%rsi),%rax
d92e: 48 8b 40 18 mov 0x18(%rax),%rax
d932: 48 83 e8 01 sub $0x1,%rax
d936: 48 03 46 10 add 0x10(%rsi),%rax
d93a: c6 00 00 movb $0x0,(%rax)
CowREFCNT(ssv)++;
d93d: 48 8b 06 mov (%rsi),%rax
d940: 48 8b 40 18 mov 0x18(%rax),%rax
d944: 48 83 e8 01 sub $0x1,%rax
d948: 48 03 46 10 add 0x10(%rsi),%rax
d94c: 0f b6 08 movzbl (%rax),%ecx
d94f: 8d 51 01 lea 0x1(%rcx),%edx
d952: 88 10 mov %dl,(%rax)
but I suspect fixing the other two will mitigate it.
COWing the large SVs causes problem outside of this case too, the SV buffer was propagated to other copies, While debugging I tried saving the SVs to an array, which resulted in:
tony@GANYMEDE ~/dev/perl/git/perl
$ ./miniperl.exe -Ilib ../21877-slow.txt ../21877.txt
start
Out of memory in perl:util:safesysmalloc
2 could be simply fixed by calling SvPV_shrink_to_cur()[1] or something similar at the end of sv_gets() but this is both an extra call to realloc() and if the same SV doesn't get COWed, will result in extra calls to re-expand the buffer on the next sv_gets(). So I think a more complex fix is needed.
I suspect the fix for 1 is to make SvCANCOW() check that the length isn't excessive in comparison to the string size, though we will need to define "excessive" for this case.
[1] SvPV_shrink_to_cur() disables COW on the SV, which we don't want.