OS.Cmd \n -> \r\n conversions on Windows
Under Windows, using the following programs the \n in data is converted to \r\n when the two programs are compiled and run as main.exe test.exe:

Under something more Unix-y there is no issue, I'm guessing because there is no difference there between "text" and "binary" IO.
main.ml
open Rresult
open Bos
let () =
let exe = Fpath.of_string Sys.argv.(1) |> R.error_msg_to_invalid_arg in
let cmd = Cmd.(v (p exe)) in
let data = "Hello world\n\x00\x00booooo" in
let cmd_stdin = OS.Cmd.in_string data in
let run = OS.Cmd.run_io ~err:OS.Cmd.err_stderr cmd cmd_stdin in
match OS.Cmd.out_string run with
| Ok (output, (_, `Exited 0)) ->
if output = data then
print_endline "Good"
else (
print_endline "Not good";
Printf.printf "%S\n" data;
Printf.printf "%S\n" output
)
| _ -> print_endline "Bad"
test.ml
open Rresult
open Bos
let () =
OS.File.read OS.File.dash
|> R.error_msg_to_invalid_arg
|> print_string
Isn't the problem here that you are using Pervasives.print_string in test.ml ? I guess this is what does the translation.
I just checked and the OS.Cmd combinators only use Unix.openfile (which AFAIK is oblivious of the distinction between text and binary) and OS.File only uses open_in_bin.
In fact you are using OS.File.read with OS.File.dash which in this case will not use open_in_bin but simply use Pervasives.stdin. So I guess this is the actual source of the problem.
I could use Pervasives.set_binary_mode_in on Pervasives.stdin before doing the read but there's no way to query the status to set it back to its previous state. I guess these are the only things that can be done:
- Simply document this fact and let users call
set_binary_mode_{in,out}themselves. - Call
set_binary_mode_{in,out}onstdinandstdoutinOS.File's IO functions and document the side effect performed on the channel.
What would Windows users expect ?
I kind of think that these automatic translations are more harmful than beneficial so I'm tempted by 2. With 1. it basically means that non Windows user have to think about this and they will not.
/cc @fdopen @dra27
The source of this problem is print_string - all the calls in main.ml use binary Unix.file_descr so test.ml should only receive \n on stdin, therefore the fact that stdin is in text mode won't matter. However, since stdout is in text mode in test.ml, print_string will translate \n → \r\n which main.ml reads back without translation. However, if main.ml actually had "Hello world\r\n\x00\x00booooo" then this example would work, but for scary and unstable reasons! (and weird examples like "Hello world\r\n\n\x00\x00booooo" would not work)
I think the expectation would be that stdhandles are treated in the same way as files, certainly. I don't have a reasoned opinion on which way it should be, though, I'm afraid! There is already a disparity between how Unix and Pervasives do this, for example:
Unix.(write stdout "\n" 0 1) |> ignore;
Pervasives.(output stdout "\n" 0 1) |> ignore
writes "\n\r\n" to stdout on Windows.
I expect that the rationale for this may be that the standard library behaves like the underlying C runtime (where \n really should mean "do whatever is necessary to make a line-end" and in Microsoft C by default does translations) vs the Unix library which aims to behave the same way on all platforms as far as is reasonably possible.
You can detect whether a channel is in binary mode via an exposed runtime function, but you have to use a slightly unscrupulous, if entirely safe, C stub to get at it:
#include <caml/memory.h>
CAMLextern int caml_channel_binary_mode (void *);
CAMLprim value caml_ml_channel_binary_mode(value vchannel)
{
CAMLparam1(vchannel);
CAMLreturn(Val_bool(caml_channel_binary_mode(*((void **) (Data_custom_val(vchannel))))));
}
which can then be used with
external get_binary_mode_in : in_channel -> bool = "caml_ml_binary_mode"
external get_binary_mode_out : out_channel -> bool = "caml_ml_binary_mode"
It's a surprising omission from Pervasives...