PowerShellCookbook icon indicating copy to clipboard operation
PowerShellCookbook copied to clipboard

Small Fixes to Get-FileEncoding

Open RandomNoun7 opened this issue 6 years ago • 3 comments

This change allows the Get-FileEncoding cmdlet to run on PowerShell core and thus on Linux.

The encoding parameter to get content has changed in core so that 'byte' is no longer valid.

The alias for Sort-Object to simply 'Sort' has also been removed in the latest versions of PS Core. This change also removes the 'Select' alias, even though it isn't technically removed yet, because there is some discussion around removing all of these aliases in the default PS Core environments, so best to get rid of it now.

https://github.com/PowerShell/PowerShell/issues/5870

RandomNoun7 avatar Apr 17 '19 21:04 RandomNoun7

Forgive me if you already know this, but you can't see the diff because the file is encoded in UTF-16LE. If you want to convert the file to UTF-8 in a new commit, or if you would like me to do so in a different PR, I'm happy to wait for that and rebase this change on top of it.

In the meantime the changes are as follows.

Line 2056 from:

foreach($encodingLength in $encodingLengths | Sort -Descending)

To:

foreach($encodingLength in $encodingLengths | Sort-Object -Descending)

Line 2058 from:

$bytes = Get-Content -encoding byte -readcount $encodingLength $path | Select -First 1

To:

$bytes = Get-Content -raw -readcount $encodingLength $path | Select-Object -First 1

RandomNoun7 avatar Apr 17 '19 22:04 RandomNoun7

I think this needs further testing. It looks like this call to Get-Content may be stripping the byte order mark from files that start with one.

RandomNoun7 avatar Apr 17 '19 22:04 RandomNoun7

Ok, this version works in PS 5.1 and PS Core on Windows and Linux.

Unfortunately I had to resort to a stream reader as the -raw switch on Get-Content only ignores line endings, but it still only wants to return what it considers the usable data, so it strips the BOM if it exists, breaking the encoding matcher.

The new call on line 2058 is:

$bytes = [system.io.file]::ReadAllBytes($path) | Select-Object -first $encodingLength

Reading all bytes isn't the greatest strategy, but the alternative of manually handling a memory stream seems a little heavy handed for an example like this.

Ready for merge.

RandomNoun7 avatar Apr 17 '19 23:04 RandomNoun7