perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

$PerlIO::encoding::fallback = FB_DEFAULT leads to duplicated output

Open p5pRT opened this issue 20 years ago • 9 comments

Migrated from rt.perl.org#29720 (status was 'open')

Searchable as RT29720$

p5pRT avatar May 19 '04 15:05 p5pRT

From [email protected]

To​: perlbug@​perl.org Subject​: $PerlIO​::encoding​::fallback = FB_DEFAULT leads to duplicated output Reply-To​: aa29@​mail.ru Message-Id​: <5.8.4_2160_1084980780@​INFORMED>

This is a bug report for perl from aa29@​mail.ru, generated with the help of perlbug 1.35 running under perl v5.8.4.

$PerlIO​::encoding​::fallback = FB_DEFAULT leads to duplicated output.

It is possible to change check-mode via $PerlIO​::encoding​::fallback​:

use Encode qw(​:fallback_all); use encoding 'utf8';

$PerlIO​::encoding​::fallback = FB_DEFAULT;

binmode(STDERR, "​:encoding(cp866)"); warn "foobar";

This code gives four messages instead of one​:

foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7.

And with redirection STDERR to file it gives three messages​:

foobar at 6.pl line 7. foobar at 6.pl line 7. foobar at 6.pl line 7.

Further investigation shows that there is no duplication if $PerlIO​::encoding​::fallback = FB_DEFAULT | FB_PERLQQ; # or FB_(HT|X)MLCREF

Looking into ext\Encode\Encode.xs I found such code (ext\Encode\Encode.xs, line 229)​:

  if (check && !(check & ENCODE_LEAVE_SRC)){ sdone = SvCUR(src) - (slen+sdone); if (sdone) {   sv_setpvn(src, (char*)s+slen, sdone); } SvCUR_set(src, sdone);   }

If check is set to FB_DEFAULT (which is 0) and no other fallback is defined, then it behaves as if ENCODE_LEAVE_SRC is set, and buffer does not became truncated, and then it will be flushed several times.


Flags​:   category=core   severity=medium


Site configuration information for perl v5.8.4​:

Configured by aa29 at Mon May 17 17​:59​:46 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration​:   Platform​:   osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread   uname=''   config_args='undef'   hint=recommended, useposix=true, d_sigaction=undef   usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n, bincompat5005=undef   Compiler​:   cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE _DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL _MSVCRT_READFIX',   optimize='-MD -Zi -DNDEBUG -O1',   cppflags='-DWIN32'   ccversion='', gccversion='', gccosandvers=''   intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234   d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10   ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8   alignbytes=8, prototype=define   Linker and Libraries​:   ld='link', ldflags '-nologo -nodefaultlib -debug -opt​:ref,icf -libpath​:"c​:\perl\lib\CORE" -ma chine​:x86'   libpth=\lib   libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib   perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib   libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib   gnulibc_version='undef'   Dynamic Linking​:   dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '   cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt​:ref,icf -libpath​:"c​:\perl \lib\CORE" -machine​:x86'

Locally applied patches​:


@​INC for perl v5.8.4​:   C​:/Perl/lib   C​:/Perl/site/lib   .


Environment for perl v5.8.4​:   HOME (unset)   LANG (unset)   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)

PATH=C​:\cygwin\bin;C​:\Tcl\bin;C​:\WINDOWS\system32;C​:\WINDOWS;C​:\WINDOWS\Syst em32\Wbem;C​:\Perl\bin;C​:\Program Files\Support Tools;D​:\src\lib;C​:\Program Files\Microsoft Visual Studio\Common\Tools\WinNT;C​:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin;C​:\Program Files\Microsoft Visual Studio\Common\Tools;C​:\Program Files\Microsoft Visual Studio\VC98\bin;C​:\Arc;C​:\Program Files\Utils;C​:\Mysql\bin;C​:\Program Files\Debugging Tools for Windows;C​:\Tcl\bin;D​:\Linda\XML\fop;C​:\Program Files\GNU\WinCvs 1.2;D​:\src\bin;C​:\Program Files\Far   PERL_BADLANG (unset)   SHELL (unset)

aa29

p5pRT avatar May 19 '04 15:05 p5pRT

From [email protected]

On Wed May 19 08​:34​:56 2004, aa29 wrote​:

$PerlIO​::encoding​::fallback = FB_DEFAULT leads to duplicated output.

This old bug still exists in bleadperl (as of a few days ago). It also exists in all of perl 5.14.2, 5.12.3, 5.10.1. I have tested on amd64-linux with this command.

$ ~/local/perlblead/bin/perl5.15.4 -we 'use Encode; use
PerlIO::encoding; $PerlIO::encoding::fallback = Encode::FB_XMLCREF();
binmode STDOUT, "encoding(iso-8859-2)" or die; print
"\x{e9}l\x{151}.u\x{ef} \x{2203}t\n";'
élő.u&#xef; &#x2203;t
élő.u&#xef; &#x2203;t
élő.u&#xef; &#x2203;t
$

p5pRT avatar Nov 18 '11 15:11 p5pRT

The RT System itself - Status changed from 'new' to 'open'

p5pRT avatar Nov 18 '11 15:11 p5pRT

From [email protected]

Besides printing duplicate output, a filehandle with an encoding layer with fallback set also usually raises an exception "Close with partial character" when you try to close it. This error message is not documented in either perldiag or PerlIO​::encoding, and, in any case, there shouldn't be an error.

I attach a test script that tests whether this bug is still present​: it tests for both correct output and no exception when you close the file.

Ambrus

p5pRT avatar Dec 02 '11 17:12 p5pRT

@Leont does that mean you're looking at this issue then?

toddr avatar Mar 05 '20 00:03 toddr

I have some ideas, but it may require some work on the Encode side too; FB_DEFAULT having a double meaning is inconvenient.

Leont avatar Mar 07 '20 22:03 Leont

OK. I’m gonna put your name on it so we know who is involved

toddr avatar Mar 08 '20 03:03 toddr

Thanks to @Leont this issue should be resolved in Perl 5.34.0 and later.

The solution allows you to set whatever value you like for $PerlIO::encoding::fallback, but every time you use :encoding(...), that value is sanitised (using the same logic as the workaround below) before it is actually used by the encoder/decoder.

Workaround

For versions before Perl 5.34.0, always clear the LEAVE_SRC bit and set the STOP_AT_PARTIAL bit when setting $PerlIO::encoding::fallback, e.g.:

$PerlIO::encoding::fallback = (($fallback) & ~Encode::LEAVE_SRC()) | Encode::STOP_AT_PARTIAL();

(tested with Perl 5.30.2 on Windows 10, Perl 5.30.3 on Ubuntu 20.04 LTS for WSL2, and Perl 5.28.1 on Debian Buster)

Background

When I encountered this issue a couple of days ago, I was trying to set $PerlIO::encoding::fallback to FB_DEFAULT because I was unhappy with the qq-style output and the warnings I got when I used :encoding(...). Obviously, as per this issue, that resulted in duplicated output.

After some experimentation I discovered that clearing the LEAVE_SRC bit resolved the duplicated output for all but FB_DEFAULT. But that's because LEAVE_SRC is only honored when $PerlIO::encoding::fallback is set (see Encode#LEAVE_SRC). Testing showed that by forcing an "unused" bit (e.g. 0x8000) to be set, the clear LEAVE_SRC bit would be honored and everything appeared to work.

Unhappy at having to hack a solution with an "unused" bit that may someday get used, I dug in to the code for PerlIO::encoding on MetaCPAN and found @Leont's code which sanitised $PerlIO::encoding::fallback according to the logic in the workaround above (see PerlIO-encoding/encoding.xs#L175). I assumed that it wasn't working for some reason, but it turns out that it was just the latest version of the code which wasn't included in the versions of Perl that I was testing on.

Looking at various different version of Perl going back through the years, it is clear that the default value for $PerlIO::encoding::fallback always has a clear LEAVE_SRC bit and a set STOP_AT_PARTIAL bit. Obviously @Leont came to the same conclusion. Thankfully, using this combination means I avoid using a hack, and also likely avoid some errors I hadn't yet encountered.

osir3z avatar Jul 28 '22 17:07 osir3z

Thanks to @Leont this issue should be resolved in Perl 5.34.0 and later.

@Leont, do you concur?

jkeenan avatar Sep 21 '22 21:09 jkeenan

@Leont, do you concur?

Yeah, this is solved.

Leont avatar Sep 21 '22 22:09 Leont