PerlIO::encoding infinite loop when trying to decode UTF-8 as ISO-2022-JP
From [email protected]
Created by [email protected]
When the ISO-2022-JP decoder is used by PerlIO::encoding, invalid data can send it into an infinite loop. The following three short scripts demonstrate the problem:
#step1.pl - create a file containing Unicode 201d in UTF-8 # (right double quotation mark) open(Out,">foo"); print Out "\xE2\x80\x9D"; close(Out);
#step2.pl - this script exits successfully use Encode; open(In,"foo"); print decode("iso-2022-jp",<In>);
#step3.pl - this script goes into an infinite decoding loop open(In,"<:encoding(iso-2022-jp)","foo"); print <In>;
The behavior is identical on the Apple-supplied 5.10.0 and on Strawberry Perl 5.10.1.
Perl Info
---
Flags:
category=library
severity=low
module=PerlIO::encoding
---
Site configuration information for perl 5.10.1:
Configured by win32-vanilla at Wed Oct 21 13:53:59 2009.
Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
Platform:
osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
uname='Win32 strawberryperl 5.10.1.0 #1 30 i386'
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags =' -s -O2 -DWIN32 -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fno-strict-aliasing -DPERL_MSVCRT_READFIX',
optimize='-s -O2',
cppflags='-DWIN32'
ccversion='', gccversion='3.4.5', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='long long', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"'
libpth=C:\strawberry\c\lib
libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32
perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32
libc=, so=dll, useshrplib=true, libperl=libperl510.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"'
Locally applied patches:
---
@INC for perl 5.10.1:
C:/strawberry/perl/lib
C:/strawberry/perl/site/lib
C:\strawberry\perl\vendor\lib
.
---
Environment for perl 5.10.1:
HOME (unset)
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Lenovo\Bluetooth Software\;C:\Program Files\QuickTime\QTSystem\;C:\Program Files\GNU\GnuPG\pub;C:\strawberry\c\bin;C:\strawberry\perl\bin;C:\Program Files\Mercurial;C:\emacs\bin;C:\gnuwin32\bin;C:\Program Files\OpenVPN\bin
PERL_BADLANG (unset)
SHELL (unset)
From @iabyn
still loops in blead
@jkeenan - Status changed from 'new' to 'open'
From @jkeenan
Reproduced with perl-5.24.1 on Linux:
$ perl -e 'open(OUT, q|>|, q|foo|); print OUT qq|\xE2\x80\x9D|; close OUT'
$ perl -MEncode -e 'open(IN, q|foo|); print decode("iso-2022-jp", <IN>);'
# below loops indefinitely
$ perl -e 'open(IN, q|<:encoding(iso-2022-jp)|, 'foo'); print <IN>;'
-- James E Keenan ([email protected])
From @Leont
iso-2022 is a very problematic encoding because it's escape based, as explained in Encode::PerlIO. iso-2022-jp is currently allowed in PerlIO (unlike iso-2022-ke) because apparently well-formed iso-2022-jp can be handled easily enough, but your example shows that less well formed input can be quite problematic.
In this particular case I suspect it's fixable (by handling EOF smarter), in many other cases it probably isn't.
Leon
From @khwilliamson
Can we detect we are in a loop?
This still exists in 5.37.12
Reproduced with perl-5.24.1 on Linux:
$ perl -e 'open(OUT, q|>|, q|foo|); print OUT qq|\xE2\x80\x9D|; close OUT' $ perl -MEncode -e 'open(IN, q|foo|); print decode("iso-2022-jp", <IN>);' # below loops indefinitely $ perl -e 'open(IN, q|<:encoding(iso-2022-jp)|, 'foo'); print <IN>;'
And reproduced with perl-5.40.0 on Linux.