perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

PerlIO::encoding infinite loop when trying to decode UTF-8 as ISO-2022-JP

Open p5pRT opened this issue 15 years ago • 8 comments

Migrated from rt.perl.org#73826 (status was 'open')

Searchable as RT73826$

p5pRT avatar Mar 25 '10 20:03 p5pRT

From [email protected]

Created by [email protected]

When the ISO-2022-JP decoder is used by PerlIO​::encoding, invalid data can send it into an infinite loop. The following three short scripts demonstrate the problem​:

#step1.pl - create a file containing Unicode 201d in UTF-8 # (right double quotation mark) open(Out,">foo"); print Out "\xE2\x80\x9D"; close(Out);

#step2.pl - this script exits successfully use Encode; open(In,"foo"); print decode("iso-2022-jp",<In>);

#step3.pl - this script goes into an infinite decoding loop open(In,"<​:encoding(iso-2022-jp)","foo"); print <In>;

The behavior is identical on the Apple-supplied 5.10.0 and on Strawberry Perl 5.10.1.

Perl Info
---
Flags:
    category=library
    severity=low
    module=PerlIO::encoding
---
Site configuration information for perl 5.10.1:

Configured by win32-vanilla at Wed Oct 21 13:53:59 2009.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
    uname='Win32 strawberryperl 5.10.1.0 #1 30 i386'
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags =' -s -O2 -DWIN32 -DHAVE_DES_FCRYPT  -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fno-strict-aliasing -DPERL_MSVCRT_READFIX',
    optimize='-s -O2',
    cppflags='-DWIN32'
    ccversion='', gccversion='3.4.5', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='long long', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"'
    libpth=C:\strawberry\c\lib
    libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32
    perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32
    libc=, so=dll, useshrplib=true, libperl=libperl510.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"'

Locally applied patches:
    

---
@INC for perl 5.10.1:
    C:/strawberry/perl/lib
    C:/strawberry/perl/site/lib
    C:\strawberry\perl\vendor\lib
    .

---
Environment for perl 5.10.1:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Lenovo\Bluetooth Software\;C:\Program Files\QuickTime\QTSystem\;C:\Program Files\GNU\GnuPG\pub;C:\strawberry\c\bin;C:\strawberry\perl\bin;C:\Program Files\Mercurial;C:\emacs\bin;C:\gnuwin32\bin;C:\Program Files\OpenVPN\bin
    PERL_BADLANG (unset)
    SHELL (unset)

p5pRT avatar Mar 25 '10 20:03 p5pRT

From @iabyn

still loops in blead

p5pRT avatar Mar 27 '10 18:03 p5pRT

@jkeenan - Status changed from 'new' to 'open'

p5pRT avatar Feb 17 '17 23:02 p5pRT

From @jkeenan

Reproduced with perl-5.24.1 on Linux:

$ perl -e 'open(OUT, q|>|, q|foo|); print OUT qq|\xE2\x80\x9D|; close OUT'

$ perl -MEncode -e 'open(IN, q|foo|); print decode("iso-2022-jp", <IN>);'

# below loops indefinitely
$ perl -e 'open(IN, q|<:encoding(iso-2022-jp)|, 'foo'); print <IN>;'

-- James E Keenan ([email protected])

p5pRT avatar Feb 17 '17 23:02 p5pRT

From @Leont

iso-2022 is a very problematic encoding because it's escape based, as explained in Encode​::PerlIO. iso-2022-jp is currently allowed in PerlIO (unlike iso-2022-ke) because apparently well-formed iso-2022-jp can be handled easily enough, but your example shows that less well formed input can be quite problematic.

In this particular case I suspect it's fixable (by handling EOF smarter), in many other cases it probably isn't.

Leon

p5pRT avatar Mar 04 '17 20:03 p5pRT

From @khwilliamson

Can we detect we are in a loop?

p5pRT avatar Mar 04 '17 20:03 p5pRT

This still exists in 5.37.12

khwilliamson avatar Apr 24 '23 15:04 khwilliamson

Reproduced with perl-5.24.1 on Linux:

$ perl -e 'open(OUT, q|>|, q|foo|); print OUT qq|\xE2\x80\x9D|; close OUT'

$ perl -MEncode -e 'open(IN, q|foo|); print decode("iso-2022-jp", <IN>);'

# below loops indefinitely
$ perl -e 'open(IN, q|<:encoding(iso-2022-jp)|, 'foo'); print <IN>;'

And reproduced with perl-5.40.0 on Linux.

jkeenan avatar Jun 22 '24 12:06 jkeenan