perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

regular expressions matching lines read from an in-memory scalar is extremely slow in cygwin/MSYS2 perls

Open Old-Green-Man opened this issue 1 year ago • 7 comments

Description With cygwin/MSYS2 perl when you do regular expression matching on lines read from an in-memory scalar and the regex matches something it takes orders of magnitude longer than matching the same lines read from a disk file.

Steps to Reproduce Below is code to demonstrate the behavior. An example file is attached to run with the code. In this file about 16% of the lines match.

#!/usr/bin/env perl

use warnings;
use strict;
use Time::HiRes qw( time );

my $file = shift @ARGV;
my ($fh, $time, $n);

open $fh, "<", $file;
$time = time;
$n = 0;
while(<$fh>) {
  /^ ?Query/;
  $n++;
}
printf "%f read lines from disk and do RE; n=$n.\n", time - $time;

seek $fh, 0, 0;
my $s = "";
while(<$fh>) {
  $s .= $_;
}
# my $s = do {
#   local $/;
#   <$fh>;
# };

open $fh, "<", \$s;
$time = time;
$n = 0;
while(<$fh>) {
  /^ ?Query/;
  $n++;
}
printf "%f read lines from in-memory file and do RE; n=$n.\n", time - $time;

On my cygwin system this prints:

0.122725 read lines from disk and do RE; n=570694.
27.238712 read lines from in-memory file and do RE; n=570694.

So the in-memory file is about 300 times slower.

Expected behavior I would expect the times to be roughly in the same ball park.

Perl configuration

Summary of my perl5 (revision 5 version 36 subversion 3) configuration:

  Platform:
    osname=cygwin
    osvers=3.4.10-1.x86_64
    archname=x86_64-cygwin-threads-multi
    uname='cygwin_nt-10.0-22631 walter 3.4.10-1.x86_64 2023-11-29 12:12 utc x86_64 cygwin '
    config_args='-des -Dprefix=/usr -Dmksymlinks -Darchname=x86_64-cygwin-threads -Dlibperl=cygperl5_36.dll -Dcc=gcc -Dld=g++ -Accflags=-ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc'
    ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
    optimize='-O3'
    cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
    ccversion=''
    gccversion='11.4.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'
    libpth=/usr/lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lcrypt -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lcrypt
    libc=/usr/lib/libcygwin.a
    so=dll
    useshrplib=true
    libperl=cygperl5_36.dll
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' --shared  -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    Cygwin: README
    Cygwin: use auto-image-base instead of fixed DLL base address
    Cygwin: modify hints
    Cygwin: Configure correct libsearch
    Cygwin: Configure correct libpth
    Cygwin: Win32 correct UTF8 handling
  Built under cygwin
  Compiled at Nov 30 2023 21:40:29
  %ENV:
    PERL5LIB="/home/dwrice/perl"
    CYGWIN="winsymlinks:nativestrict"
  @INC:
    /home/dwrice/perl
    /usr/local/lib/perl5/site_perl/5.36/x86_64-cygwin-threads
    /usr/local/share/perl5/site_perl/5.36
    /usr/lib/perl5/vendor_perl/5.36/x86_64-cygwin-threads
    /usr/share/perl5/vendor_perl/5.36
    /usr/lib/perl5/5.36/x86_64-cygwin-threads
    /usr/share/perl5/5.36

Old-Green-Man avatar Jan 23 '24 23:01 Old-Green-Man

Related PerlMonks discussion: https://www.perlmonks.org/?node_id=11157166

choroba avatar Jan 24 '24 06:01 choroba

@dxter-zz Any chance you could modify your script and rerun it to separately measure the time taken for:

seek $fh, 0, 0;
my $s = "";
while(<$fh>) {
  $s .= $_;
}

and

while(<$fh>) {
  /^ ?Query/;
  $n++;
}

Right now, the script's timing output conflates the two operations and its not clear how time taken is distributed across the latter two while loops.

e.g. Could this be a memory allocation/copy slowdown, rather than something in the regex engine?

richardleach avatar Jan 24 '24 09:01 richardleach

I'm not sure I get your meaning, but populating $s takes 0.14 seconds on my machine.

0.125541 read lines from disk and do RE; n=570694.
0.139314 time to populate $s.
27.575185 read lines from in-memory file and do RE; n=570694.

Old-Green-Man avatar Jan 24 '24 11:01 Old-Green-Man

I suspect part of the problem is the SvLEN() for $_ in the read from memory case is typically very large, if I time each match attempt and sort them, the matches that take the longest time have very large SvLEN() values, though I''m not sure where that's causing the problem.

The long SvLEN() values appear to be from this code in sv_gets():

    /* make sure we have the room */
    if ((I32)(SvLEN(sv) - append) <= cnt + 1) {
        /* Not room for all of it
           if we are looking for a separator and room for some
         */
        if (rslen && cnt > 80 && (I32)SvLEN(sv) > append) {
            /* just process what we have room for */
            shortbuffered = cnt - SvLEN(sv) + append + 1;
            cnt -= shortbuffered;
        }
        else {
            /* ensure that the target sv has enough room to hold
             * the rest of the read-ahead buffer */
            shortbuffered = 0;
            /* remember that cnt can be negative */
            SvGROW(sv, (STRLEN)(append + (cnt <= 0 ? 2 : (cnt + 1))));
        }
    }
    else {
        /* we have enough room to hold the full buffer, lets scream */
        shortbuffered = 0;
    }

tonycoz avatar Jan 25 '24 05:01 tonycoz

Profiling shows sv_setsv_cow() as the hot function, but if I do line-by-line profiling the two hottest lines appear to be:

        CowREFCNT(ssv) = 0;  // hot
    }
#  ifdef PERL_DEBUG_READONLY_COW
    if (already) sv_buf_to_rw(ssv);
#  endif
    CowREFCNT(ssv)++; // hot

which doesn't make much sense, CowREFCNT(sv) is pretty much *(sv->sv_u.svu_pv + ((XSVPV*)(sv->sv_any)->xpv_len)) and doesn't do anything I'd expect to be time costly.

But I don't expect sv_setsv_cow() to be expensive in any case.

That CowREFCNT() macro is one of the few places referencing SvLEN() (aka ((XSVPV*)(sv->sv_any)->xpv_len)) in the code path I expect here.

All I can think of is Windows deciding to commit (probably filling with zeroes) all of the large allocation done earlier by sv_gets() when that last byte is accessed.

tonycoz avatar Jan 29 '24 05:01 tonycoz

In case it wasn't clear and if it's helpful, the large time difference only seems to occur when there are lines in the file that match the regex.

Old-Green-Man avatar Jan 29 '24 10:01 Old-Green-Man

In case it wasn't clear and if it's helpful, the large time difference only seems to occur when there are lines in the file that match the regex.

It was clear and useful: the regexp engine (conditionally) calls sv_setsv_cow() on a successful match to save the string for later use in reporting matched text, eg with $&, $1 etc.

At this point I see three issues:

  1. short strings with a large buffer are being COWed, even without reading from an in-memory file the SV buffer is being allocated to the size of the PerlIO buffer, or 8192 bytes for a normal file, or the size of the source SV when reading from an SV, or ~35MB in this case. By COWing this SV that memory can't be used for anything else until all COW references have been destroyed. This strikes me as wasteful and isn't cygwin specific.
  2. sv_gets() pre-allocates the SV buffer to match the remaining bytes in the PerlIO buffer, this is a (mostly) reasonable performance optimization when it's just the normal file buffer (8192 bytes), but it's excessive when the buffer is huge, as with the large in-memory file here. Again, not cygwin specific.
  3. whatever strange thing is happening with assignments to CowREFCNT() on cygwin.

I don't know that we can do anything about 3., the generated assembler seems reasonable:

        CowREFCNT(ssv) = 0;
    d92b:       48 8b 06                mov    (%rsi),%rax
    d92e:       48 8b 40 18             mov    0x18(%rax),%rax
    d932:       48 83 e8 01             sub    $0x1,%rax
    d936:       48 03 46 10             add    0x10(%rsi),%rax
    d93a:       c6 00 00                movb   $0x0,(%rax)
    CowREFCNT(ssv)++;
    d93d:       48 8b 06                mov    (%rsi),%rax
    d940:       48 8b 40 18             mov    0x18(%rax),%rax
    d944:       48 83 e8 01             sub    $0x1,%rax
    d948:       48 03 46 10             add    0x10(%rsi),%rax
    d94c:       0f b6 08                movzbl (%rax),%ecx
    d94f:       8d 51 01                lea    0x1(%rcx),%edx
    d952:       88 10                   mov    %dl,(%rax)

but I suspect fixing the other two will mitigate it.

COWing the large SVs causes problem outside of this case too, the SV buffer was propagated to other copies, While debugging I tried saving the SVs to an array, which resulted in:

tony@GANYMEDE ~/dev/perl/git/perl
$ ./miniperl.exe -Ilib ../21877-slow.txt ../21877.txt 
start
Out of memory in perl:util:safesysmalloc

2 could be simply fixed by calling SvPV_shrink_to_cur()[1] or something similar at the end of sv_gets() but this is both an extra call to realloc() and if the same SV doesn't get COWed, will result in extra calls to re-expand the buffer on the next sv_gets(). So I think a more complex fix is needed.

I suspect the fix for 1 is to make SvCANCOW() check that the length isn't excessive in comparison to the string size, though we will need to define "excessive" for this case.

[1] SvPV_shrink_to_cur() disables COW on the SV, which we don't want.

tonycoz avatar Jan 29 '24 23:01 tonycoz