processx icon indicating copy to clipboard operation
processx copied to clipboard

Document common options for output encoding of Windows tools

Open gaborcsardi opened this issue 2 years ago • 2 comments
trafficstars

Create some output with non-ASCII characters, e.g. this on German or Fresh Windows:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "windows-1252")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die f\u0081r Hyper-V erforderlich sind, werden nicht ange"

So the \u0081 is not converted, apparently, even though that seems to be the default encoding:

❯ [System.Text.Encoding]::Default


IsSingleByte      : True
BodyName          : iso-8859-1
EncodingName      : Westeuropäisch (Windows)
HeaderName        : Windows-1252
WebName           : Windows-1252
WindowsCodePage   : 1252
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1252

This might be some systeminfo or Windows thing, because according to https://en.wikipedia.org/wiki/Windows-1252 \x81 should be unused:

According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[15]

However, 850 seems to work well:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "850")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die für Hyper-V erforderlich sind, werden nicht ange"

gaborcsardi avatar Mar 01 '23 09:03 gaborcsardi

Seems like processx will use the default code page, unless the console is inherited:

> processx::run("chcp")$stdout
[1] "Aktive Codepage: 437.\r\n"
> processx::run("chcp", stdout = "")$stdout
Aktive Codepage: 65001.

gaborcsardi avatar Mar 01 '23 09:03 gaborcsardi

Per https://serverfault.com/questions/80635/how-can-i-manually-determine-the-codepage-and-locale-of-the-current-os/836221#836221 we can get the code page(s) from the registry, both for command line apps and for old gui apps. OEMCP could be the default for processx, although that means that we would need to explicitly set encoding = "" when we call (new) R from R.

gaborcsardi avatar Mar 01 '23 10:03 gaborcsardi