wrf-python icon indicating copy to clipboard operation
wrf-python copied to clipboard

BLD,BUG: Add charset-normalizer to improve compatibility with non-ascii environments.

Open BeiyanYunyi opened this issue 1 year ago • 1 comments

On a system with a non-ascii compatible LANG environment variable, gfortran will produce non-ascii output. My working environment is Linux with LANG=zh_CN.UTF-8, in my environment,

gfortran -E ompgen.F90 -o omp.f90 -cpp

will output:

# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<命令行>"
# 1 "ompgen.F90"
!... other code

instead of:

# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "ompgen.F90"

Chinese character at line 3 will cause the project fail to build:

      Traceback (most recent call last):
        File "/home/BeiyanYunyi/.cache/uv/builds-v0/.tmpP5ioKB/lib/python3.11/site-packages/numpy/f2py/crackfortran.py", line 391, in
      readfortrancode
          l = fin.readline()
              ^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 292, in
      readline
          line = self._readline()
                 ^^^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 372, in
      _readline
          return self._readline()
                 ^^^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/encodings/ascii.py", line 26, in
      decode
          return codecs.ascii_decode(input, self.errors)[0]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 71: ordinal not in range(128)

To reproduce the bug, simply run this command in the repo (POSIX environment):

LANG=zh_CN.UTF-8 pip install

As numpy.f2py suggests, It is likely that installing charset_normalizer package will help f2py determine the input file encoding correctly. Adding charset-normalizer to build-system.requires will make it infer the encoding correctly. After adding it to build-system.requires, I've successfully built this package.

BeiyanYunyi avatar Apr 16 '25 16:04 BeiyanYunyi

Thanks for the PR!

I'll take a look at this tomorrow.

kafitzgerald avatar Apr 17 '25 21:04 kafitzgerald