processx
processx copied to clipboard
Document common options for output encoding of Windows tools
Create some output with non-ASCII characters, e.g. this on German or Fresh Windows:
res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "windows-1252")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die f\u0081r Hyper-V erforderlich sind, werden nicht ange"
So the \u0081 is not converted, apparently, even though that seems to be the default encoding:
❯ [System.Text.Encoding]::Default
IsSingleByte : True
BodyName : iso-8859-1
EncodingName : Westeuropäisch (Windows)
HeaderName : Windows-1252
WebName : Windows-1252
WindowsCodePage : 1252
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1252
This might be some systeminfo or Windows thing, because according to https://en.wikipedia.org/wiki/Windows-1252 \x81 should be unused:
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[15]
However, 850 seems to work well:
res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "850")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die für Hyper-V erforderlich sind, werden nicht ange"
Seems like processx will use the default code page, unless the console is inherited:
> processx::run("chcp")$stdout
[1] "Aktive Codepage: 437.\r\n"
> processx::run("chcp", stdout = "")$stdout
Aktive Codepage: 65001.
Per https://serverfault.com/questions/80635/how-can-i-manually-determine-the-codepage-and-locale-of-the-current-os/836221#836221 we can get the code page(s) from the registry, both for command line apps and for old gui apps. OEMCP could be the default for processx, although that means that we would need to explicitly set encoding = "" when we call (new) R from R.