cli icon indicating copy to clipboard operation
cli copied to clipboard

[check-setup] UnicodeEncodeErrors on terminals with non-Unicode locales

Open terrycojones opened this issue 4 years ago • 5 comments

There is a non-ASCII unicode ellipsis ('\u2026') on line 61 of nextstrain/cli/command/check_setup.py:

$  nextstrain check-setup                                                                                                    (auspice) 
nextstrain-cli is up to date!

Traceback (most recent call last):
  File "/home/terry/miniconda3/envs/auspice/bin/nextstrain", line 8, in <module>
    sys.exit(main())
  File "/home/terry/miniconda3/envs/auspice/lib/python3.6/site-packages/nextstrain/cli/__main__.py", line 10, in main
    return cli.run( argv[1:] )
  File "/home/terry/miniconda3/envs/auspice/lib/python3.6/site-packages/nextstrain/cli/__init__.py", line 56, in run
    return opts.__command__.run(opts)
  File "/home/terry/miniconda3/envs/auspice/lib/python3.6/site-packages/nextstrain/cli/command/check_setup.py", line 61, in run
    print("Testing your setup\u2026")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 18: ordinal not in range(128)

Replacing it with 3 dots just gets you into trouble further on (line 91):

  File "/home/terry/s/net/cli/nextstrain/cli/command/check_setup.py", line 91, in run
    print(status.get(result, str(result)) + ":", formatted_description)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2718' in position 7: ordinal not in range(128)

terrycojones avatar Mar 28 '20 15:03 terrycojones

The additional Unicode characters that cause a UnicodeEncodeError with check_setup.py and Python 3.6 are on lines 50-55 of check_setup.py:

status = {
        True:  success("✔ yes"),
        False: failure("✘ no"),
        None:  warning("⚑ warning"),
        ...:   unknown("? unknown"),
    }

bu-bgregor avatar Mar 31 '20 12:03 bu-bgregor

@terrycojones I believe this occurs when your terminal isn't using a Unicode locale, such as en_US.UTF-8 for example. Can you paste the output of running the locale command?

Some CLI frameworks like Click do a lot of contortions to automagically deal with terminal encoding issues, but this project doesn't use Click (yet…?) and currently assumes a Unicode locale.

tsibley avatar Mar 31 '20 20:03 tsibley

@tsibley Ah, yes, you're right, thanks. Setting LC_ALL to C.UTF-8 fixed it for me. I already had LANG=en_US.UTF-8 but that wasn't enough.

I guess for this issue, if anyone has the time/interest, printing a test (e.g., zero-width space) unicode char and catching UnicodeEncodeError would allow you to decide what to output. But maybe that's overkill. The exception could also be caught and you could print out a line telling the user what to do, depending on what shell they are using :-) Or, just close this issue......

terrycojones avatar Mar 31 '20 21:03 terrycojones

@terrycojones Glad it's fixed for you!

Testing for UnicodeEncodeError is a clever way of detecting support! I'd think you could also inspect the encoding of stdout, but that might lie. :-) In any case, I expect the best way to solve this is to adopt the Unicode terminal support and fallback handling from a robust library like Click instead of implementing it ourselves.

I'll leave this issue open as a reminder that this does exist as a legitimate issue and for anyone who might come across the same error and need a fix.

tsibley avatar Apr 01 '20 05:04 tsibley

Thanks for the pointer to Click, I'd not heard of it. Re leaving the issue open - up to you, of course. I would close it :-)

terrycojones avatar Apr 01 '20 07:04 terrycojones