perl5
perl5 copied to clipboard
$PerlIO::encoding::fallback = FB_DEFAULT leads to duplicated output
From [email protected]
To: perlbug@perl.org Subject: $PerlIO::encoding::fallback = FB_DEFAULT leads to duplicated output Reply-To: aa29@mail.ru Message-Id: <5.8.4_2160_1084980780@INFORMED>
This is a bug report for perl from aa29@mail.ru, generated with the help of perlbug 1.35 running under perl v5.8.4.
$PerlIO::encoding::fallback = FB_DEFAULT leads to duplicated output.
It is possible to change check-mode via $PerlIO::encoding::fallback:
use Encode qw(:fallback_all); use encoding 'utf8';
$PerlIO::encoding::fallback = FB_DEFAULT;
binmode(STDERR, ":encoding(cp866)"); warn "foobar";
This code gives four messages instead of one:
foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7.
And with redirection STDERR to file it gives three messages:
foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7.
Further investigation shows that there is no duplication if $PerlIO::encoding::fallback = FB_DEFAULT | FB_PERLQQ; # or FB_(HT|X)MLCREF
Looking into ext\Encode\Encode.xs I found such code (ext\Encode\Encode.xs, line 229):
if (check && !(check & ENCODE_LEAVE_SRC)){ sdone = SvCUR(src) - (slen+sdone); if (sdone) { sv_setpvn(src, (char*)s+slen, sdone); } SvCUR_set(src, sdone); }
If check is set to FB_DEFAULT (which is 0) and no other fallback is defined, then it behaves as if ENCODE_LEAVE_SRC is set, and buffer does not became truncated, and then it will be flushed several times.
Flags: category=core severity=medium
Site configuration information for perl v5.8.4:
Configured by aa29 at Mon May 17 17:59:46 2004.
Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE _DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL _MSVCRT_READFIX', optimize='-MD -Zi -DNDEBUG -O1', cppflags='-DWIN32' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags '-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"c:\perl\lib\CORE" -ma chine:x86' libpth=\lib libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib gnulibc_version='undef' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -libpath:"c:\perl \lib\CORE" -machine:x86'
Locally applied patches:
@INC for perl v5.8.4: C:/Perl/lib C:/Perl/site/lib .
Environment for perl v5.8.4: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset)
PATH=C:\cygwin\bin;C:\Tcl\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\Syst em32\Wbem;C:\Perl\bin;C:\Program Files\Support Tools;D:\src\lib;C:\Program Files\Microsoft Visual Studio\Common\Tools\WinNT;C:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin;C:\Program Files\Microsoft Visual Studio\Common\Tools;C:\Program Files\Microsoft Visual Studio\VC98\bin;C:\Arc;C:\Program Files\Utils;C:\Mysql\bin;C:\Program Files\Debugging Tools for Windows;C:\Tcl\bin;D:\Linda\XML\fop;C:\Program Files\GNU\WinCvs 1.2;D:\src\bin;C:\Program Files\Far PERL_BADLANG (unset) SHELL (unset)
aa29
From [email protected]
On Wed May 19 08:34:56 2004, aa29 wrote:
$PerlIO::encoding::fallback = FB_DEFAULT leads to duplicated output.
This old bug still exists in bleadperl (as of a few days ago). It also exists in all of perl 5.14.2, 5.12.3, 5.10.1. I have tested on amd64-linux with this command.
$ ~/local/perlblead/bin/perl5.15.4 -we 'use Encode; use
PerlIO::encoding; $PerlIO::encoding::fallback = Encode::FB_XMLCREF();
binmode STDOUT, "encoding(iso-8859-2)" or die; print
"\x{e9}l\x{151}.u\x{ef} \x{2203}t\n";'
élő.uï ∃t
élő.uï ∃t
élő.uï ∃t
$
The RT System itself - Status changed from 'new' to 'open'
From [email protected]
Besides printing duplicate output, a filehandle with an encoding layer with fallback set also usually raises an exception "Close with partial character" when you try to close it. This error message is not documented in either perldiag or PerlIO::encoding, and, in any case, there shouldn't be an error.
I attach a test script that tests whether this bug is still present: it tests for both correct output and no exception when you close the file.
Ambrus
@Leont does that mean you're looking at this issue then?
I have some ideas, but it may require some work on the Encode
side too; FB_DEFAULT
having a double meaning is inconvenient.
OK. I’m gonna put your name on it so we know who is involved
Thanks to @Leont this issue should be resolved in Perl 5.34.0 and later.
The solution allows you to set whatever value you like for $PerlIO::encoding::fallback
, but every time you use :encoding(...)
, that value is sanitised (using the same logic as the workaround below) before it is actually used by the encoder/decoder.
Workaround
For versions before Perl 5.34.0, always clear the LEAVE_SRC
bit and set the STOP_AT_PARTIAL
bit when setting $PerlIO::encoding::fallback
, e.g.:
$PerlIO::encoding::fallback = (($fallback) & ~Encode::LEAVE_SRC()) | Encode::STOP_AT_PARTIAL();
(tested with Perl 5.30.2 on Windows 10, Perl 5.30.3 on Ubuntu 20.04 LTS for WSL2, and Perl 5.28.1 on Debian Buster)
Background
When I encountered this issue a couple of days ago, I was trying to set $PerlIO::encoding::fallback
to FB_DEFAULT
because I was unhappy with the qq
-style output and the warnings I got when I used :encoding(...)
. Obviously, as per this issue, that resulted in duplicated output.
After some experimentation I discovered that clearing the LEAVE_SRC
bit resolved the duplicated output for all but FB_DEFAULT
. But that's because LEAVE_SRC
is only honored when $PerlIO::encoding::fallback
is set (see Encode#LEAVE_SRC). Testing showed that by forcing an "unused" bit (e.g. 0x8000
) to be set, the clear LEAVE_SRC
bit would be honored and everything appeared to work.
Unhappy at having to hack a solution with an "unused" bit that may someday get used, I dug in to the code for PerlIO::encoding
on MetaCPAN and found @Leont's code which sanitised $PerlIO::encoding::fallback
according to the logic in the workaround above (see PerlIO-encoding/encoding.xs#L175
). I assumed that it wasn't working for some reason, but it turns out that it was just the latest version of the code which wasn't included in the versions of Perl that I was testing on.
Looking at various different version of Perl going back through the years, it is clear that the default value for $PerlIO::encoding::fallback
always has a clear LEAVE_SRC
bit and a set STOP_AT_PARTIAL
bit. Obviously @Leont came to the same conclusion. Thankfully, using this combination means I avoid using a hack, and also likely avoid some errors I hadn't yet encountered.
Thanks to @Leont this issue should be resolved in Perl 5.34.0 and later.
@Leont, do you concur?
@Leont, do you concur?
Yeah, this is solved.