perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

Parsing "\c\" is broken at the end of a double-quoted string.

Open tlhackque opened this issue 5 months ago • 3 comments

Description Compiling the string "\c\" (double-quote backslash c backlash double-quote) fails. I expect it to produce a one-character string. In ASCII/UTF-8 ord "\c\" should be 0x1c (28.).

Adding another character immediately before the second '"' will parse, but produces a 2 character string. In this case, ord produces the expected result.

Work-around: 'substr("\c\+",0,1)`. Of course this gets painful for longer strings, e.g. "abcdef\c\"

It seems that the parser isn't consuming the second \ as part of the \c escape, and is attaching it to the closing " instead... Note that qq{} has the same defect - it's not specific to ".

This appears to be quite old behavior - I can reproduce it on Perl V5.8.8. (Admittedly, this is a peculiar use case, but I encountered it writing real code.)

(Hopefully, Markdown's backslash processing hasn't further confused this report - backslashes happen everywhere!)

Steps to Reproduce

$ perl -Mwarnings -Mstrict  -e'my $x="\c\";'
Can't find string terminator '"' anywhere before EOF at -e line 1.
$ perl -Mwarnings -Mstrict  -e'my $x="\c\+";'
$

# It's definitely an end-of-string issue:
$  perl -Mwarnings -Mstrict -e'my $x="abcdef\c\";'
Can't find string terminator '"' anywhere before EOF at -e line 1.
$

# And the quoting style doesn't matter
$  perl -Mwarnings -Mstrict -e'my $x=qq{\c\};'
Can't find string terminator "}" anywhere before EOF at -e line 1.
$  perl -Mwarnings -Mstrict -e'my $x=qq{\c\+};'
$

Expected behavior The string "\c\" parses without errors and contains the correct character.

Perl configuration

 perl -V
Summary of my perl5 (revision 5 version 40 subversion 1) configuration:

  Platform:
    osname=linux
    osvers=6.11.0
    archname=x86_64-linux-thread-multi
    uname='linux localhost 6.11.0 #1 smp preempt_dynamic 6.11.0 x86_64 gnulinux '
    config_args='-des -Doptimize=none -Dccflags=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Dldflags=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1   -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1   -Dlddlflags=-shared -Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1   -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.40.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5/5.40 -Dsitearch=/usr/local/lib64/perl5/5.40 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize -Duse64bitint'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='  -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fwrapv -fno-strict-aliasing -I/usr/local/include'
    ccversion=''
    gccversion='14.2.1 20250110 (Red Hat 14.2.1-7)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags ='-Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1  -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib
    libs=-lpthread -lresolv -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lresolv -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib64/libc.so.6
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.40'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 '
    cccdlflags='-fPIC'
    lddlflags='-lpthread -shared -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1  -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_SITECUSTOMIZE
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch2: support for libdir64
    Fedora Patch3: use libresolv instead of libbind
    Fedora Patch4: USE_MM_LD_RUN_PATH
    Fedora Patch5: Provide MM::maybe_command independently (bug #1129443)
    Fedora Patch6: Dont run one io test due to random builder failures
    Fedora Patch8: Define SONAME for libperl.so
    Fedora Patch9: Install libperl.so to -Dshrpdir value
    Fedora Patch10: Make *DBM_File desctructors thread-safe (RT#61912)
    Fedora Patch11: Replace EU::MakeMaker dependency with EU::MM::Utils in IPC::Cmd (bug #1129443)
    Fedora Patch12: Link XS modules to pthread library to fix linking with -z defs
    Fedora Patch13: Pass the correct CFLAGS to dtrace
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux
    Fedora Patch202: Add definition of OPTIMIZE to .ph files
  Built under linux
  Compiled at Jan 20 2025 00:00:00
  @INC:
    /usr/local/lib64/perl5/5.40
    /usr/local/share/perl5/5.40
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5

(Edit: show prompts for successful compiles, quote terminal output)

tlhackque avatar Jun 10 '25 21:06 tlhackque

Two nits: First, if you put 3 backticks ('```') around the terminal inputs and outputs in the Steps to Reproduce section of your post, you get more readable output.

Second, in your third example, you don't post the output you get from the second command.

Now, more to the point ... In your third example, I get different results:

$ perl -v | head -2 | tail -1
This is perl 5, version 40, subversion 1 (v5.40.1) built for x86_64-linux

$  perl -Mwarnings -Mstrict -e'my $x=qq{\c};'
Missing control char name in \c at -e line 1, within string
Execution of -e aborted due to compilation errors.

$ perl -Mwarnings -Mstrict -e'my $x=qq{\c+};'
"\c+" is more clearly written simply as "k" at -e line 1.

With an older version of perl, I get:

$ perlbrew use perl-5.14.4

$ perl -v | head -2 | tail -1
This is perl 5, version 14, subversion 4 (v5.14.4) built for x86_64-linux

$ perl -Mwarnings -Mstrict -e'my $x=qq{\c};'
Missing control char name in \c at -e line 1, within string
Execution of -e aborted due to compilation errors.

$ perl -Mwarnings -Mstrict -e'my $x=qq{\c+};'
"\c+" is more clearly written simply as "k" at -e line 1.

I agree with you that, if this is indeed a problem, it is a very old one.

Your perl -V output suggests that you used the RedHat "vendor perl" build to generate these results. That perl has an enormous number of configuration options. What results do you get if you build with a perl-5.40.1 tarball and simple configuration options such as:

sh ./Configure -des -Dusedevel && make test_prep

jkeenan avatar Jun 10 '25 22:06 jkeenan

Second, in your third example, you don't post the output you get from the second command.

That's because it compiles successfully with the extra character, the only output is the shell prompt.

I've updated to show the prompts and fixed the terminal output.

Now, more to the point ... In your third example, I get different results:

$ perl -v | head -2 | tail -1 This is perl 5, version 40, subversion 1 (v5.40.1) built for x86_64-linux

$ perl -Mwarnings -Mstrict -e'my $x=qq{\c};' Missing control char name in \c at -e line 1, within string Execution of -e aborted due to compilation errors.

You're missing the backslash before the close curly, which quotes it instead of making part of the escape. So your error is expected (and correct). I believe that you will reproduce my results if you copy the test case. Should be easier with the revised output quoting.

The escape is supposed to generate a Control-Backslash from Backslash-c-Backslash.

The Perl version is what Fedora 41 ships; it's not locally customized.

tlhackque avatar Jun 10 '25 23:06 tlhackque

This is a known bug. I could only find one open ticket involving it. If someone knows how to search for a backslash either in github or on the internet, please let me know. Anyway, the ticket I found was #14331.

The bug dates from whenever \c was added to the language. Larry Wall had an idea for fixing it; I've looked into it in the past; and there just hasn't been much demand to justify really delving into it. It's not trivial. And it's documented. perlop and perlebcdic both mention it.

Also, "\c*X*" yields " chr(28) . "X"" for any X, but cannot come at the end of a string, because the backslash would be parsed as escaping the end quote

khwilliamson avatar Jun 11 '25 05:06 khwilliamson