aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

encoding problem of aws cloudformation package

Open kidotaka opened this issue 3 years ago • 2 comments

Describe the bug

Even if we specify AWS_CLI_FILE_ENCODING=utf-8, aws cloudformation package with --output-template-file create a non utf-8 output. It depends on locale because python build-in open function is used. Input template file is utf-8, but output template file is non utf-8. This sometime leads an encode error.

When we use nested stacks, aws cli create a temporary file without an encoding before uploading. There is the same encoding problem.

Expected Behavior

aws cloudformation package command should create a file with an encoding of AWS_CLI_FILE_ENCODING.

Current Behavior

Encode Error occur. 'cp932' codec can't encode character '\U0002000b' in position 186: illegal multibyte sequence In this case, I used aws cli on Winodws.

Reproduction Steps

  • prepare cloudformation nested stack template files with utf-8 (include non-ascii characters, such as surrogate pair U+2000B="D840 DC0B")
  • export AWS_CLI_FILE_ENCODING=utf-8
  • aws cloudformation package with --output-template-file option

Possible Solution

I suggest that use compat_open() or getpreferredencoding() of aws-cli/compat.py for an output file encoding.

built-in open function is used in

Work arounds:

  • PYTHONUTF8=1 (utf-8 mode, python 3.7+) (available on both Widows and Linux)
  • LC_CTYPE=UTF-8 (Linux only)

Additional Information/Context

Conjunction use of "AWS_CLI_FILE_ENCODING" and "LC_CTYPE" is not a better solution. Because fewer locales are available for "LC_CTYPE" on CodeBuild by default, so we need install a language pack additionally. Locale is difficult to change on Windows. Python can handle encodings without an additional library.

CLI version used

2.7.18

Environment details (OS name and version, etc.)

Windows 10

kidotaka avatar Jul 28 '22 08:07 kidotaka

Hi @kidotaka thanks for reaching out. Have you tried setting your locale as LC_ALL=en_US.UTF-8 to address this?

tim-finnigan avatar Jul 29 '22 20:07 tim-finnigan

Hi @tim-finnigan I have not tried LC_ALL=en_US.UTF-8, but LC_ALL=en_US.UTF-8 is also available instead of LC_CTYPE on Linux. We can use en_US.UTF-8 without additional installation on CodeBuild's managed linux images. (locale -a shows en_US.UTF-8)

There are several workarounds, but if AWS_CLI_FILE_ENCODING is used for writing, there are no need to use workarounds.

actual template file encoding AWS_CLI_FILE_ENCODING other conditions result (--output-template-file, temporary nested stack template) workaround
UTF-8 UTF-8 on Linux and LC_CTYPE=POSIX ascii. occasional encoding error LC_CTYPE or LC_ALL or PYTHONUTF8=1
UTF-8 UTF-8 on Windows and locale dependent encoding is not UTF-8 non UTF-8. occasional encoding error It's hard to change locale on Windows. PYTHONUTF8=1 can be used.
non UTF-8 non UTF-8 on Linux and LC_CTYPE=POSIX ascii. occasional encoding error LC_CTYPE or LC_ALL and additional language package installation is needed
non UTF-8 non UTF-8 on Windows and locale dependent encoding is not UTF-8 non UTF-8 no problem if encodings are same

kidotaka avatar Aug 04 '22 05:08 kidotaka