IPC-Run icon indicating copy to clipboard operation
IPC-Run copied to clipboard

binmode is a necessity for win32 usability and must be default

Open wchristian opened this issue 7 years ago • 15 comments
trafficstars

Just leaving this as a note, since i recently ran into it.

Was trying to use a library that calls an external program to generate PDF and another author had decided to use IPC::Run inside the library. Since PDFs are binary data by default, the result was of course mojibake, and due to the entirety of the situation there was no way to resolve this other than to replace IPC::Run entirely.

I cannot offhand think of a situation where enabling binmode by default would cause any serious breakage that couldn't be managed (e.g. tests that don't expect this behavior).

wchristian avatar Mar 27 '18 17:03 wchristian

I tend to agree. @toddr ?

mohawk2 avatar Mar 27 '18 18:03 mohawk2

Actually we have another ticket asking us to have an option to default to utf-8

toddr avatar Mar 27 '18 21:03 toddr

It sounds like we need a global for this or if the run() interface supports it, maybe it could be an option passed in.

toddr avatar Mar 27 '18 21:03 toddr

Defaulting to UTF8 might be acceptable too, as it disables newline mangling as well. I'd still highly recommend you actually try and see what happens if you make utf8 default in various environments when dealing with 3rd party binary producers. I expect some breakage, for example when dealing with the pure ascii windows cmd shell.

Keep in mind that making binmode doesn't add any data transformation, it disables a dangerous one. Making UTF8 default would indeed add a transformation.

wchristian avatar Mar 27 '18 23:03 wchristian

Btw, which ticket are you talking about?

wchristian avatar Mar 27 '18 23:03 wchristian

@wchristian: Pumping UTF-8 data mangles it [rt.cpan.org #58853] #87

toddr avatar Mar 29 '18 17:03 toddr

#87 has an example script. Would the change you're proposing here for binmode also fix it? If someone can test that then I'd be all for just setting the default. Ideally though we should probably still try to provide a way to somehow alter the mode at run time?

toddr avatar Mar 29 '18 17:03 toddr

Reading the other one now.

As for the second question:

Yes, binary mode should be the default, but allowing customization is also key, due to it being hard to configure the called code in most cases.

wchristian avatar Mar 30 '18 14:03 wchristian

Not sure if the binmode thing would fix anything for UTF, but i think it might help improve the situation. I've also rewritten the test script there a little to show the newline issue:

#!/usr/bin/perl
use strict;
use warnings;
use B 'perlstring';

my $report_length  = 0;                                      # set this to 1 to switch string dumps to string length
my $mode           = ":raw";
my @lines          = ( "ab", "\n", "\r", "\r\n", "\n\r" );
my $display_length = 8;                                      # set this to something reasonable

sub fm { sprintf "%-${display_length}s", $report_length ? length $_[0] : perlstring $_[0] }

if ( defined $ENV{IPC_SUB_PROCESS} ) {
    binmode STDIN,  $mode;
    binmode STDERR, $mode;
    binmode STDOUT, $mode;
    my $in = do { local $/; <STDIN> };
    my $out = $lines[ $ENV{IPC_SUB_INDEX} ];
    print STDERR fm( $in ) . " | " . fm( $out ) . " out became: ";
    print $out;
    exit;
}

$ENV{IPC_SUB_PROCESS} = 1;
use IPC::Run ();
for my $i ( 0 .. $#lines ) {
    print fm( $lines[$i] ) . " in became: ";
    $ENV{IPC_SUB_INDEX} = $i;
    IPC::Run::run( [ "perl", __FILE__ ], "<", \$lines[$i], ">", \my $out );
    print fm( $out ) . "\n";
}

__END__

D:\>perl ipc_run.pl
"ab"     in became: "ab"     | "ab"     out became: "ab"
"\n"     in became: "\r\n"   | "\n"     out became: "\n"
"\r"     in became: "\r"     | "\r"     out became: "\r"
"\r\n"   in became: "\r\n"   | "\r\n"   out became: "\n"
"\n\r"   in became: "\r\n\r" | "\n\r"   out became: "\n\r"

wchristian avatar Mar 30 '18 16:03 wchristian

Debian report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=450663

toddr avatar Apr 02 '18 15:04 toddr

Oh god, yeah, implicit charset munging will get different results depending on locale. Even more power to the argument of making binmode default.

wchristian avatar Apr 03 '18 08:04 wchristian

Ok. sounds like we need a patch. I'm overloaded this week so won't be able to look at it myself.

toddr avatar Apr 03 '18 14:04 toddr

@wchristian Do you feel like constructing a little bit of .t that will capture the charset munging? I can sellotape that plus your script above into a test and then have a go at the code to fix it.

mohawk2 avatar Apr 03 '18 17:04 mohawk2

Sent tests in #120.

wchristian avatar Apr 05 '18 16:04 wchristian

Tests are in and presumably failing on windows. we need to get the windows smoker working.

toddr avatar Apr 05 '18 16:04 toddr