BLD,BUG: Add charset-normalizer to improve compatibility with non-ascii environments.
On a system with a non-ascii compatible LANG environment variable, gfortran will produce non-ascii output. My working environment is Linux with LANG=zh_CN.UTF-8, in my environment,
gfortran -E ompgen.F90 -o omp.f90 -cpp
will output:
# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<命令行>"
# 1 "ompgen.F90"
!... other code
instead of:
# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "ompgen.F90"
Chinese character at line 3 will cause the project fail to build:
Traceback (most recent call last):
File "/home/BeiyanYunyi/.cache/uv/builds-v0/.tmpP5ioKB/lib/python3.11/site-packages/numpy/f2py/crackfortran.py", line 391, in
readfortrancode
l = fin.readline()
^^^^^^^^^^^^^^
File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 292, in
readline
line = self._readline()
^^^^^^^^^^^^^^^^
File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 372, in
_readline
return self._readline()
^^^^^^^^^^^^^^^^
File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/encodings/ascii.py", line 26, in
decode
return codecs.ascii_decode(input, self.errors)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 71: ordinal not in range(128)
To reproduce the bug, simply run this command in the repo (POSIX environment):
LANG=zh_CN.UTF-8 pip install
As numpy.f2py suggests, It is likely that installing charset_normalizer package will help f2py determine the input file encoding correctly. Adding charset-normalizer to build-system.requires will make it infer the encoding correctly. After adding it to build-system.requires, I've successfully built this package.
Thanks for the PR!
I'll take a look at this tomorrow.