azure-cli icon indicating copy to clipboard operation
azure-cli copied to clipboard

Consider use UTF-8 by default for Azure CLI

Open doggy8088 opened this issue 3 months ago • 1 comments

Describe the bug

I was reported a bug on StackOverflow: https://stackoverflow.com/q/78008939/910074

When I have to use UTF-8 for my default console output encoding ([Console]::OutputEncoding), the Azure CLI unable to handle Chinese characters because Encoding issue. It cause either Chinese chars missing or produce messy code.

Related command

$(az account list -o json)

az account list -o json | jq '.'

Errors

image

Issue script & Debug output

It's an encoding issue.

Expected behavior

I expected Azure CLI can handle Chinese characters correctly.

Environment Summary

azure-cli 2.57.0

core 2.57.0 telemetry 1.1.0

Extensions: account 0.2.3 azure-devops 0.25.0 front-door 1.0.16 interactive 0.4.5 k8s-extension 1.2.4 managementpartner 0.1.3

Dependencies: msal 1.26.0 azure-mgmt-resource 23.1.0b2

Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe' Extensions directory 'C:\Users\wakau.azure\cliextensions'

Python (Windows) 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Additional context

I have a workaround by now. Just edit C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd file. Add -X utf8 to the python arguments.

::
:: Microsoft Azure CLI - Windows Installer - Author file components script
:: Copyright (C) Microsoft Corporation. All Rights Reserved.
::

@IF EXIST "%~dp0\..\python.exe" (
  SET AZ_INSTALLER=MSI
  "%~dp0\..\python.exe" -X utf8 -IBm azure.cli %*
) ELSE (
  echo Failed to load python executable.
  exit /b 1
)

doggy8088 avatar Mar 02 '24 10:03 doggy8088

Thank you for opening this issue, we will look into it.

yonzhan avatar Mar 02 '24 10:03 yonzhan

I am able to repro with the latest PowerShell 7.4.1. My system locale is English (United States):

image

Printing to console is fine:

> az group show -n testrg
{
  ...
  "tags": {
    ...
    "key1": "测试"
  },
  ...
}

But a warning is shown when redirecting:

> az group show -n testrg > out.txt
WARNING: Unable to encode the output with cp1252 encoding. Unsupported characters are discarded.

(Actually, I wrote that warning in https://github.com/microsoft/knack/pull/178.)

According to https://docs.python.org/3/library/sys.html#sys.stdout

sys.stdout Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage).

So changing the console's encoding with [Console]::OutputEncoding = [Text.UTF8Encoding]::new() won't affect Python's output encoding.

I would recommend changing your system encoding to UTF-8 (follow https://github.com/microsoft/knack/pull/178), so that you won't need to modify the az.cmd entry script every time you update Azure CLI.

Also see: https://github.com/python/cpython/issues/74595

jiasli avatar Mar 04 '24 05:03 jiasli

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

doggy8088 avatar Mar 04 '24 06:03 doggy8088

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

Can you explain why? My personal desktop computer is using UTF-8 as I need to display Chinese (Simplified, China).

image

jiasli avatar Mar 04 '24 07:03 jiasli

I can verify Windows PowerShell 5.1 can't handle UTF-8 correctly:

> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.22621.2506
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.22621.2506
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

> [Console]::OutputEncoding

IsSingleByte      : True
BodyName          : IBM437
EncodingName      : OEM United States
HeaderName        : IBM437
WebName           : IBM437
WindowsCodePage   : 1252
IsBrowserDisplay  : False
IsBrowserSave     : False
IsMailNewsDisplay : False
IsMailNewsSave    : False
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 437

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

This can be fixed by setting [Console]::OutputEncoding = [Text.UTF8Encoding]::new():

> [Console]::OutputEncoding = [Text.UTF8Encoding]::new()

> [Console]::OutputEncoding

BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

https://stackoverflow.com/a/78023334/2199657 mentions PowerShell 7.4 doesn't interpret the redirected data anymore.

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_redirection?view=powershell-7.4#redirecting-output-from-native-commands

PowerShell 7.4 changed the behavior of the redirection operators when used to redirect the stdout stream of a native command. The redirection operators now preserve the byte-stream data when redirecting output from a native command. PowerShell doesn't interpret the redirected data or add any additional formatting.

Simply calling python -X utf8 will work:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

Same approach can be used to call Azure CLI:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -IBm azure.cli group show -n testrg > out.txt ; Get-Content out.txt
{
  ...
  "tags": {
    ...
    "key1": "测试測試"
  },
  ...
}

jiasli avatar Mar 04 '24 08:03 jiasli

Wait. As you are already using cp950 which is big5: ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5) according to https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers, I guess you are trying to parse characters not in cp950. May I know the original Chinese character that is causing problem?

jiasli avatar Mar 04 '24 10:03 jiasli

I'm okay with the cp950 in both Windows PowerShell or PowerShell 7+.

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console. That's why I need az.cmd to output UTF-8 by default.

doggy8088 avatar Mar 04 '24 13:03 doggy8088

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console.

I fail to understand the relationship between Oh-My-Posh and encoding. Could you give more context on this? I don't think it is Oh-My-Posh that causes the encoding error. May I know the original Chinese character that is causing problem?

jiasli avatar Mar 05 '24 02:03 jiasli

It doesn't matter what original Chinese character are. All Chinese characters will be truncated from the output.

For your confusing, it because Oh-My-Posh can define special unicode font to display symbols on the prompt, like this:

image

So that my Console output encoding must be in UTF-8 encoding. Let's why I don't set cp950 on the Console.

doggy8088 avatar Mar 05 '24 05:03 doggy8088

I don't think this got anything to do with Oh-My-Posh when redirection is involved. Without redirection, like a pure az account list, the output is indeed in UTF-8.

https://docs.python.org/3/library/sys.html#sys.stdout

On Windows, UTF-8 is used for the console device.

> python -c "import sys; print(sys.stdout.encoding)"
utf-8

In your original screenshot, Azure CLI is trying to encode its output with cp950, but certain characters can't be encoded by cp950 showing as "unsupported":

image

Besides Azure CLI, you can repro this issue with Python:

> python -c "print('测试')" > out.txt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jiasli\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

> python -c "import sys; print(sys.stdout.encoding)" > out.txt ; Get-Content out.txt
cp1252

jiasli avatar Mar 07 '24 02:03 jiasli

Here is my test:

image

doggy8088 avatar Mar 07 '24 09:03 doggy8088