isso icon indicating copy to clipboard operation
isso copied to clipboard

Use UTF8 as encoding to open config files

Open Guts opened this issue 3 years ago • 3 comments

Checklist

  • [ ] All new and existing tests are passing
  • [ ] (If adding features:) I have added tests to cover my changes
  • [ ] (If docs changes needed:) I have updated the documentation accordingly.
  • [ ] I have added an entry to CHANGES.rst because this is a user-facing change or an important bugfix
  • [ ] I have written proper commit message(s)

What changes does this Pull Request introduce?

Set encoding=UTF-8 in open methods used in config module to load configuration files.

Why is this necessary?

I've been facing UnicodeDecodeError issues using Apache2 + mod_wsgi + Isso.

I've found 2 solutions:

  • hack config.py to set encoding="utf8" in io.open calls
  • Add lang='en_US.UTF-8' and locale='en_US.UTF-8' to the WSGIDaemonProcess in the Apache vhost config. See this blog post from Graham (maintainer of the mod_wsgi)

I'm using this second option but still, I think this is a small reliability improvment for Isso to explicitly use UTF8 to read configuration files.

Files used

isso_wsgi.py:

import site  # noqa: E402

site.addsitedir("{{ comments_path }}.venv")

from pathlib import Path  # noqa: E402

# 3rd party
from isso import config, make_app  # noqa: E402

# globals
isso_conf_file = Path(__file__).parent / "isso-prod.cfg"

application = make_app(
    config.load(
        config.default_file(),
        str(isso_conf_file.resolve())
    ),
    multiprocessing=True,
    threading=True,
)

tail /var/log/apache2/geotribu_comments_error.log -n 25:

[Thu Nov 03 17:39:15.240427 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]   File "/var/www/geotribu/comments/isso_wsgi.py", line 20, in <module>
[Thu Nov 03 17:39:15.240429 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]     config.load(
[Thu Nov 03 17:39:15.240433 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]   File "/var/www/geotribu/comments/.venv/lib/python3.10/site-packages/isso/config.py", line 153, in load
[Thu Nov 03 17:39:15.240435 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]     parser.read_file(f)
[Thu Nov 03 17:39:15.240439 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]   File "/usr/lib/python3.10/configparser.py", line 719, in read_file
[Thu Nov 03 17:39:15.240441 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]     self._read(f, source)
[Thu Nov 03 17:39:15.240444 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]   File "/usr/lib/python3.10/configparser.py", line 1021, in _read
[Thu Nov 03 17:39:15.240446 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]     for lineno, line in enumerate(fp, start=1):
[Thu Nov 03 17:39:15.240450 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]   File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode
[Thu Nov 03 17:39:15.240452 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462]     return codecs.ascii_decode(input, self.errors)[0]
[Thu Nov 03 17:39:15.240460 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)
[Thu Nov 03 17:47:27.489385 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] mod_wsgi (pid=2052433): Failed to exec Python script file '/var/www/geotribu/comments/isso_wsgi.py'.
[Thu Nov 03 17:47:27.489485 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] mod_wsgi (pid=2052433): Exception occurred processing WSGI script '/var/www/geotribu/comments/isso_wsgi.py'.
[Thu Nov 03 17:47:27.489931 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] Traceback (most recent call last):
[Thu Nov 03 17:47:27.489961 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]   File "/var/www/geotribu/comments/isso_wsgi.py", line 20, in <module>
[Thu Nov 03 17:47:27.489964 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]     config.load(
[Thu Nov 03 17:47:27.489968 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]   File "/var/www/geotribu/comments/.venv/lib/python3.10/site-packages/isso/config.py", line 153, in load
[Thu Nov 03 17:47:27.489971 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]     parser.read_file(f)
[Thu Nov 03 17:47:27.489974 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]   File "/usr/lib/python3.10/configparser.py", line 719, in read_file
[Thu Nov 03 17:47:27.489977 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]     self._read(f, source)
[Thu Nov 03 17:47:27.489991 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]   File "/usr/lib/python3.10/configparser.py", line 1021, in _read
[Thu Nov 03 17:47:27.489994 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]     for lineno, line in enumerate(fp, start=1):
[Thu Nov 03 17:47:27.489997 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]   File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode
[Thu Nov 03 17:47:27.490000 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850]     return codecs.ascii_decode(input, self.errors)[0]
[Thu Nov 03 17:47:27.490011 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)

Guts avatar Nov 03 '22 17:11 Guts

By default, open() uses your environment to detect the encoding; utf-8 may be common nowadays, but it seems unnecessary to force everybody to use it. See https://docs.python.org/3/library/functions.html#open and https://docs.python.org/3/library/locale.html#locale.getencoding

What do the commands locale and python3 -c 'import locale; print(locale.getlocale())' display on your machine?

jelmer avatar Dec 07 '22 13:12 jelmer

$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
$ python3 -c 'import locale; print(locale.getlocale())'
('en_US', 'UTF-8')

Guts avatar Dec 07 '22 22:12 Guts

Marking this as draft, since either way the PR needs rebasing.

jelmer avatar Aug 04 '23 13:08 jelmer

Closing this, since it's been open for >1.5y. It's also not clear that it is necessary to me, and it's unreadable in its current shape.

jelmer avatar Mar 10 '24 16:03 jelmer