isso
isso copied to clipboard
Use UTF8 as encoding to open config files
Checklist
- [ ] All new and existing tests are passing
- [ ] (If adding features:) I have added tests to cover my changes
- [ ] (If docs changes needed:) I have updated the documentation accordingly.
- [ ] I have added an entry to
CHANGES.rstbecause this is a user-facing change or an important bugfix - [ ] I have written proper commit message(s)
What changes does this Pull Request introduce?
Set encoding=UTF-8 in open methods used in config module to load configuration files.
Why is this necessary?
I've been facing UnicodeDecodeError issues using Apache2 + mod_wsgi + Isso.
I've found 2 solutions:
- hack config.py to set
encoding="utf8"inio.opencalls - Add lang='en_US.UTF-8' and locale='en_US.UTF-8' to the
WSGIDaemonProcessin the Apache vhost config. See this blog post from Graham (maintainer of the mod_wsgi)
I'm using this second option but still, I think this is a small reliability improvment for Isso to explicitly use UTF8 to read configuration files.
Files used
isso_wsgi.py:
import site # noqa: E402
site.addsitedir("{{ comments_path }}.venv")
from pathlib import Path # noqa: E402
# 3rd party
from isso import config, make_app # noqa: E402
# globals
isso_conf_file = Path(__file__).parent / "isso-prod.cfg"
application = make_app(
config.load(
config.default_file(),
str(isso_conf_file.resolve())
),
multiprocessing=True,
threading=True,
)
tail /var/log/apache2/geotribu_comments_error.log -n 25:
[Thu Nov 03 17:39:15.240427 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] File "/var/www/geotribu/comments/isso_wsgi.py", line 20, in <module>
[Thu Nov 03 17:39:15.240429 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] config.load(
[Thu Nov 03 17:39:15.240433 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] File "/var/www/geotribu/comments/.venv/lib/python3.10/site-packages/isso/config.py", line 153, in load
[Thu Nov 03 17:39:15.240435 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] parser.read_file(f)
[Thu Nov 03 17:39:15.240439 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] File "/usr/lib/python3.10/configparser.py", line 719, in read_file
[Thu Nov 03 17:39:15.240441 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] self._read(f, source)
[Thu Nov 03 17:39:15.240444 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] File "/usr/lib/python3.10/configparser.py", line 1021, in _read
[Thu Nov 03 17:39:15.240446 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] for lineno, line in enumerate(fp, start=1):
[Thu Nov 03 17:39:15.240450 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode
[Thu Nov 03 17:39:15.240452 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] return codecs.ascii_decode(input, self.errors)[0]
[Thu Nov 03 17:39:15.240460 2022] [wsgi:error] [pid 2047229:tid 139679207601728] [remote 86.229.66.142:45462] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)
[Thu Nov 03 17:47:27.489385 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] mod_wsgi (pid=2052433): Failed to exec Python script file '/var/www/geotribu/comments/isso_wsgi.py'.
[Thu Nov 03 17:47:27.489485 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] mod_wsgi (pid=2052433): Exception occurred processing WSGI script '/var/www/geotribu/comments/isso_wsgi.py'.
[Thu Nov 03 17:47:27.489931 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] Traceback (most recent call last):
[Thu Nov 03 17:47:27.489961 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] File "/var/www/geotribu/comments/isso_wsgi.py", line 20, in <module>
[Thu Nov 03 17:47:27.489964 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] config.load(
[Thu Nov 03 17:47:27.489968 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] File "/var/www/geotribu/comments/.venv/lib/python3.10/site-packages/isso/config.py", line 153, in load
[Thu Nov 03 17:47:27.489971 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] parser.read_file(f)
[Thu Nov 03 17:47:27.489974 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] File "/usr/lib/python3.10/configparser.py", line 719, in read_file
[Thu Nov 03 17:47:27.489977 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] self._read(f, source)
[Thu Nov 03 17:47:27.489991 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] File "/usr/lib/python3.10/configparser.py", line 1021, in _read
[Thu Nov 03 17:47:27.489994 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] for lineno, line in enumerate(fp, start=1):
[Thu Nov 03 17:47:27.489997 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode
[Thu Nov 03 17:47:27.490000 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] return codecs.ascii_decode(input, self.errors)[0]
[Thu Nov 03 17:47:27.490011 2022] [wsgi:error] [pid 2052433:tid 139659067721280] [remote 86.229.66.142:53850] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)
By default, open() uses your environment to detect the encoding; utf-8 may be common nowadays, but it seems unnecessary to force everybody to use it. See https://docs.python.org/3/library/functions.html#open and https://docs.python.org/3/library/locale.html#locale.getencoding
What do the commands locale and python3 -c 'import locale; print(locale.getlocale())' display on your machine?
$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
$ python3 -c 'import locale; print(locale.getlocale())'
('en_US', 'UTF-8')
Marking this as draft, since either way the PR needs rebasing.
Closing this, since it's been open for >1.5y. It's also not clear that it is necessary to me, and it's unreadable in its current shape.