vscode-powershell icon indicating copy to clipboard operation
vscode-powershell copied to clipboard

Built-in `help` function writes Unicode BOM () to host

Open brantb opened this issue 8 years ago • 16 comments

System Details

  • Operating system name and version: Windows 10 Version 1709
  • VS Code version: 1.18.0
  • PowerShell extension version: 1.5.1
  • Output from $PSVersionTable:
Name                           Value
----                           -----
PSVersion                      5.1.16299.64
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.16299.64
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
  • VSCode plugin list:
C:\> code --list-extensions --show-versions
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Issue Description

When I use the help function, a few garbage unicode characters are written to the output stream. Using Get-Help instead of help works as expected. This only happens in the extension-provided Powershell Integrated Console, not the default vscode console.

C:\> help get-date

NAME
    Get-Date
[...snip...]
C:\> help get-date -showwindow

C:\> help get-date > $null
(no output)
C:\> get-help get-date -showwindow
(no output)

On my system, the definition of the help function is:

C:\> Get-Command help | select -ExpandProperty Definition
<#
.FORWARDHELPTARGETNAME Get-Help
.FORWARDHELPCATEGORY Cmdlet 
#>
[CmdletBinding(DefaultParameterSetName='AllUsersView', HelpUri='https://go.microsoft.com/fwlink/?LinkID=113316')]
param(
    [Parameter(Position=0, ValueFromPipelineByPropertyName=$true)]
    [string]
    ${Name},

    [string]
    ${Path},

    [ValidateSet('Alias','Cmdlet','Provider','General','FAQ','Glossary','HelpFile','ScriptCommand','Function','Filter','ExternalScript','All','DefaultHelp','Workflow','DscResource','Class','Configuration')]
    [string[]]
    ${Category},

    [string[]]
    ${Component},

    [string[]]
    ${Functionality},

    [string[]]
    ${Role},

    [Parameter(ParameterSetName='DetailedView', Mandatory=$true)]
    [switch]
    ${Detailed},

    [Parameter(ParameterSetName='AllUsersView')]
    [switch]
    ${Full},

    [Parameter(ParameterSetName='Examples', Mandatory=$true)]
    [switch]
    ${Examples},

    [Parameter(ParameterSetName='Parameters', Mandatory=$true)]
    [string]
    ${Parameter},

    [Parameter(ParameterSetName='Online', Mandatory=$true)]
    [switch]
    ${Online},

    [Parameter(ParameterSetName='ShowWindow', Mandatory=$true)]
    [switch]
    ${ShowWindow})

    #Set the outputencoding to Console::OutputEncoding. More.com doesn't work well with Unicode.
    $outputEncoding=[System.Console]::OutputEncoding

    Get-Help @PSBoundParameters | more

... and more is defined as ...

C:\> Get-Command more | select -ExpandProperty Definition

param([string[]]$paths)
$OutputEncoding = [System.Console]::OutputEncoding
if($paths) {
    foreach ($file in $paths)
    {
        Get-Content $file | more.com
    }
} else { $input | more.com }

Attached Logs

1510862054-d0c22b7a-9fe8-4400-9e12-9cb2d4fd6b5a1510862041231.zip

brantb avatar Nov 16 '17 19:11 brantb

Google suggests it's a byte order mark.

brantb avatar Nov 16 '17 23:11 brantb

help is only showing what is in the help text. This issue should be opened here: https://github.com/powershell/powershell-docs to use utf8-noBOM

SteveL-MSFT avatar Nov 17 '17 20:11 SteveL-MSFT

I'm just clarifying, but is this truly an issue with the help text if this issue only surfaces in vscode-powershell's Visual Studio Code Host and not in any other host like ConsoleHost?

brantb avatar Nov 17 '17 20:11 brantb

One of the changes that was made to PSIC (presumably to fix another issue) was that the output encoding was changed to UTF8 from the default. That might explain why it behaves differently than the regular PowerShell terminal.

rkeithhill avatar Nov 17 '17 22:11 rkeithhill

@brantb you'll see the same behavior on Linux/macOS with PowerShell Core 6 on the console. Windows understands the BOM and doesn't show it.

SteveL-MSFT avatar Nov 17 '17 22:11 SteveL-MSFT

Confirming I still see this only on the PowerShell Integrated Console.

Halkcyon avatar May 12 '18 03:05 Halkcyon

Closing as resolved as we have now documented how to configure encoding for PowerShell in Vscode: https://docs.microsoft.com/en-us/powershell/scripting/components/vscode/understanding-file-encoding?view=powershell-6

SydneyhSmith avatar Mar 27 '19 22:03 SydneyhSmith

I looked at that doc, and I may be missing something but it doesn't seem to prevent the integrated console from displaying the BOM character?

dsolodow avatar Mar 28 '19 02:03 dsolodow

@dsolodow I reviewed the issue and you are correct, I am re-opening this!

We noticed that this issue only occurs with Help and not with Get-Help the difference being that Help initially displays a smaller result with --More-- which comes from more.com so this may be what is causing the encoding issue.

SydneyhSmith avatar Mar 28 '19 21:03 SydneyhSmith

For reference the the UTF-8 BOM is 0xEF 0xBB 0xBF. When interpreted with code page 437 (AKA DOS Latin US) it resolves as the ASCII bow drawing characters .

My current suspicion is that the integrated console is resolving more.com for help, which can't understand UTF-8.

rjmholt avatar Mar 28 '19 22:03 rjmholt

@TylerLeonhardt, @rjmholt , and I looked at this. It appears to be a combination of the extension setting [console]::OutputEncoding to UTF8 (w/ BOM) and use of 437 code page. This results in a BOM being written and a codepage that renders it. I believe @TylerLeonhardt is working on a proposed fix.

SteveL-MSFT avatar Mar 29 '19 03:03 SteveL-MSFT

... possibly. I need to speak to the vscode folks to see if they have any ideas.

My thinking is that [System.Console]::OutputEncoding is somehow related to the chcp output...

In pwsh.exe, [System.Console]::OutputEncoding is set to Code Page 437 (on Windows). In the extension, we overwrite this:

[console]::OutputEncoding = [Encoding]::UTF8

Which is why we're seeing the BOM in the PowerShell Integrated Console...

However, if we don't do that... then the PowerShell Integrated Console can no longer render non-ascii characters like Chinese characters and the like.

That's why we originally overwrote the [Console]::OutputEncoding... but that was probably not the right approach. There should be a way to not see the BOM but also see non-ASCII characters... just like what the non-Integrated Console shows.

TylerLeonhardt avatar Mar 29 '19 06:03 TylerLeonhardt

I'll quote @rjmholt on this... "Encoding is a tar pit" 😅

TylerLeonhardt avatar Mar 29 '19 06:03 TylerLeonhardt

But you can set OutputEncoding to UTF8 NoBOM.

SteveL-MSFT avatar Mar 30 '19 00:03 SteveL-MSFT

@TylerLeonhardt I ran into a similar issue surrounding more.com and the like recently. The encoding issue I ran into was not resolved until I fixed both the codepage using chcp.com and $OutputEncoding.

I could not replicate it on the latest Win10 build, however, just Win7.

Halkcyon avatar Apr 01 '19 12:04 Halkcyon

@dsolodow I reviewed the issue and you are correct, I am re-opening this!

We noticed that this issue only occurs with Help and not with Get-Help the difference being that Help initially displays a smaller result with --More-- which comes from more.com so this may be what is causing the encoding issue.

I can confirm that this is still the same. All beit that it shows:

help Test-Date -Examples
´╗┐
NAME
    Test-Date

ALIASES
    None


REMARKS
    None
get-help Test-Date -Examples

NAME
    Test-Date

ALIASES
    None


REMARKS
    None

To add some extra:

chcp.com
Active code page: 850
[System.Console]::OutputEncoding

Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

Than do the following:

chcp.com 437

And ´╗┐ will change into ∩╗┐

A workarround for me:

Set-Alias -Name help -Value Get-Help

Than the original help from DOS will not come into play.

B-Art avatar Jul 31 '24 07:07 B-Art