swipl-devel icon indicating copy to clipboard operation
swipl-devel copied to clipboard

`user_input` on Windows can't be set to `encoding(octet)`

Open eignnx opened this issue 5 months ago • 0 comments
trafficstars

There seems to be no way for SWI Prolog to read exactly one byte from user_input on Windows. This is because user_input's encoding is locked as wchar_t which on Windows is UTF-16/UCS-2.

1 ?- get_byte(X).
ERROR: No permission to read bytes from TEXT stream `current_input'
ERROR: In:
ERROR:   [12] get_byte(_1032)
ERROR:   [11] toplevel_call(user:user: ...) at c:/program files/swipl/boot/toplevel.pl:1315

The official advice is to switch the stream into encoding(octet) and read codes then. But but this also doesn't work[^1]:

2 ?- set_stream(user_input, encoding(octet)).
ERROR: No permission to encoding stream `user_input'
ERROR: In:
ERROR:   [12] set_stream(user_input,encoding(octet))
ERROR:   [11] toplevel_call(user:user: ...) at c:/program files/swipl/boot/toplevel.pl:1315

Similarly for encoding(utf8): user_input cannot change it's encoding property on Windows.

Consequences

This means there is no way to reliably read a UTF-8 encoded file into an SWI Prolog process via user_input. Any multi-byte UTF-8 sequences sent will be misinterpreted as UTF-16/UCS-2. For example:

$ cat broken_heart.txt
💔

$ od -t x1 -t u1 broken_heart.txt # A UTF-8 encoded file
0000000  f0  9f  92  94  0d  0a
        240 159 146 148  13  10
0000006
% read_codes.pl
:- initialization(main, main).

main :-
    read_stream_to_codes(user_input, Codes),
    format('~k "~s"~n', [Codes, Codes]).
$ cat broken_heart.txt | swipl --quiet read_codes.pl
[240,376,8217,8221,10] "💔
"

(Shell examples were run from the Git Bash shell for Windows.)

Note

In the above example the codes list [240,376,8217,8221,10] is being printed incorrectly as well. For instance, 8217 is U+2019 which is Right Single Quotation Mark, not Æ, however neither of these characters appear in the input file, so text is being garbled. This is probably a combination of my terminal settings (Windows Terminal 1.22.11141.0) and the fact that user_output is also locked to wchar_t. I haven't looked into this yet.

Is this an Issue?

So why not instead read in from a socket or from a file handle?

In some cases you do actually need to read from stdin. For example one could not correctly implement the Bash command read in SWI Prolog on Windows if the input is a non-UTF-16 source.

The place I've run into this is in trying to get jamesnvc/lsp_server working on Windows. At least for Neovim integration, an LSP client is normally expected to read in UTF-8-encoded JSON-RPC requests through stdin. The project is having to ask Windows users to use a socket-based client instead (see this issue), which brings along it's own issues.

Fixes

The easiest way to solve UTF-8→UTF-16 encoding issues on stdin would be to allow encoding(octet) for stdin and let the user do the conversion on their own via utf8_codes//1.

Eventually of course it would be great to support encoding(utf8) on stdin as well, and also to solve this same issue for stdout.

[^1]: Prior discussion: https://swi-prolog.discourse.group/t/no-permission-to-encoding-a-user-stream/5368/11

eignnx avatar Jun 04 '25 19:06 eignnx