bug: Failure to parse files with UTF8 byte-order mark
Griffe fails to parse Python files that begin with a UTF8 byte-order mark (a.k.a. BOM, code point U+FEFF).
Minimal reproducer with an otherwise empty Python module:
from griffe import GriffeLoader
from pathlib import Path
loader = GriffeLoader(search_paths=[Path('.')])
file = Path('empty_except_bom.py')
file.write_text('', encoding='utf-8-sig')
module = loader.load(file.stem)
Raises:
SyntaxError: invalid non-printable character U+FEFF
Full traceback
Could not load package Package(name='empty_except_bom', path=WindowsPath('C:/home/projects/MPh/docs/Griffe_bug_UTF8_BOM/empty_except_bom.py'), stubs=None)
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\demo_bug.py", line 20, in <module>
module = loader.load(file.stem)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)
Environment information
❯ griffe --debug-info
- __System__: Windows-11-10.0.22621-SP0
- __Python__: cpython 3.13.4 (C:\scratch\venvs\Griffe\Scripts\python.exe)
- __Environment variables__:
- `PYTHONPATH`: `C:\home\tools;C:\polybox\work\tools`
- __Installed packages__:
- `griffe` v1.7.4.dev1172+g441b3b7
UTF-8 BOMs aren't used a lot, but are supported by the Python interpreter. I, for one, use them routinely, as they prevent editing mishaps on Windows, where some editors default to ANSI encoding if there are no non-ASCII characters in the file yet, and a Unicode character is then added. (Though the situation has improved over the last years, and most Windows editors/IDEs now default to UTF-8 too, just like on other platforms.)
PR to follow. Related issue: #99.
Thanks for the report and PR!
Sounds like yet another Windows quirk being offloaded to developers/maintainers. Does this mean I should always use the utf-8-sig encoding when reading files, everywhere, in all my Python programs?
Apparently using a BOM isn't recommended: https://www.unicode.org/versions/Unicode5.0.0/ch02.pdf#G19273.
Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature.
Strangely, in the REPL, if I instantiate the loader before creating the file, I'm able to load the module the first time 😕 But I can consistently reproduce the issue out of the REPL 👍
Does this mean I should always use the
utf-8-sigencoding when reading files, everywhere, in all my Python programs?
Yes, that's what it generally means. Though I am flummoxed by the CI failing on the PR. It fails on Python 3.14 across the board (not just Windows), but also on Python 3.13, though only Ubuntu(!). I don't know what that means right now.
Apparently using a BOM isn't recommended: https://www.unicode.org/versions/Unicode5.0.0/ch02.pdf#G19273.
Maybe it not being recommended isn't all that relevant. Python/CPython supports it, recommended or not. So it's valid Python code.
Strangely, in the REPL, if I instantiate the loader before creating the file, I'm able to load the module the first time 😕 But I can consistently reproduce the issue out of the REPL 👍
I'm not 100% sure I understand what you did there. I certainly can reproduce the error with the main branch, and then see no error with the PR branch, when I copy and paste the reproducer script above line by line into the REPL.
To be clear, this is what I see locally on Ubuntu (via WSL on Windows, though I don't think the Windows part is relevant):
❯ git branch
* main
support-utf8-bom
❯ git rev-parse HEAD
97b12bda02a0e3682bb43c07dfbd23c2ee01db46
❯ uv venv
Using CPython 3.13.4
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate
❯ uv sync
Resolved 120 packages in 22ms
░░░░░░░░░░░░░░░░░░░░ [0/112] Installing wheels... warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 112 packages in 3m 20s
+ ansimarkup==1.5.0
+ appdirs==1.4.4
+ attrs==25.3.0
+ babel==2.17.0
+ backrefs==5.8
+ beautifulsoup4==4.13.4
+ build==1.2.2.post1
+ cappa==0.28.0
+ certifi==2025.4.26
+ cffi==1.17.1
+ charset-normalizer==3.4.2
+ click==8.2.1
+ code2flow==2.5.1
+ colorama==0.4.6
+ coverage==7.8.2
+ cryptography==45.0.3
+ csscompressor==0.9.5
+ docutils==0.21.2
+ duty==1.6.0
+ execnet==2.1.1
+ failprint==1.0.3
+ ghp-import==2.1.0
+ git-changelog==2.5.3
+ gitdb==4.0.12
+ gitpython==3.1.44
+ griffe==1.7.4.dev1174+g603088f (from file:///mnt/c/scratch/repos/other/Griffe)
+ griffe-inherited-docstrings==1.1.1
+ htmlmin2==0.1.13
+ humanize==4.12.3
+ id==1.5.0
+ idna==3.10
+ iniconfig==2.1.0
+ jaraco-classes==3.4.0
+ jaraco-context==6.0.1
+ jaraco-functools==4.1.0
+ jeepney==0.9.0
+ jinja2==3.1.6
+ jsmin==3.0.1
+ jsonschema==4.24.0
+ jsonschema-specifications==2025.4.1
+ keyring==25.6.0
+ markdown==3.8
+ markdown-callouts==0.4.0
+ markdown-exec==1.10.3
+ markdown-it-py==3.0.0
+ markdownify==1.1.0
+ markupsafe==3.0.2
+ mdformat==0.7.22
+ mdurl==0.1.2
+ mergedeep==1.3.4
+ mkdocs==1.6.1
+ mkdocs-autorefs==1.4.2
+ mkdocs-coverage==1.1.0
+ mkdocs-gen-files==0.5.0
+ mkdocs-get-deps==0.2.0
+ mkdocs-git-revision-date-localized-plugin==1.4.7
+ mkdocs-llmstxt==0.2.0
+ mkdocs-material==9.6.14
+ mkdocs-material-extensions==1.3.1
+ mkdocs-minify-plugin==0.8.0
+ mkdocs-redirects==1.2.2
+ mkdocs-section-index==0.3.10
+ mkdocstrings==0.29.1
+ mkdocstrings-python==1.16.12
+ more-itertools==10.7.0
+ mypy==1.16.0
+ mypy-extensions==1.1.0
+ nh3==0.2.21
+ packaging==25.0
+ paginate==0.5.7
+ pathspec==0.12.1
+ platformdirs==4.3.8
+ pluggy==1.6.0
+ ptyprocess==0.7.0
+ pycparser==2.22
+ pydeps==3.0.1
+ pygments==2.19.1
+ pygments-ansi-color==0.3.0
+ pymdown-extensions==10.15
+ pyproject-hooks==1.2.0
+ pysource-codegen==0.6.0
+ pysource-minimize==0.8.0
+ pytest==8.4.0
+ pytest-cov==6.1.1
+ pytest-randomly==3.16.0
+ pytest-xdist==3.7.0
+ python-dateutil==2.9.0.post0
+ pytz==2025.2
+ pyyaml==6.0.2
+ pyyaml-env-tag==1.1
+ readme-renderer==44.0
+ referencing==0.36.2
+ requests==2.32.3
+ requests-toolbelt==1.0.0
+ rfc3986==2.0.0
+ rich==14.0.0
+ rpds-py==0.25.1
+ ruff==0.11.13
+ secretstorage==3.3.3
+ semver==3.0.4
+ six==1.17.0
+ smmap==5.0.2
+ soupsieve==2.7
+ stdlib-list==0.11.1
+ twine==6.1.0
+ type-lens==0.2.3
+ types-markdown==3.8.0.20250415
+ types-pyyaml==6.0.12.20250516
+ typing-extensions==4.14.0
+ urllib3==2.4.0
+ watchdog==6.0.0
+ yore==0.4.3
❯ uv run pytest
========================================================= test session starts ==========================================================
platform linux -- Python 3.13.4, pytest-8.4.0, pluggy-1.6.0
Using --randomly-seed=3122520084
rootdir: /mnt/c/scratch/repos/other/Griffe
configfile: pyproject.toml
plugins: cov-6.1.1, randomly-3.16.0, xdist-3.7.0
collected 814 items
tests/test_nodes.py .................................................................................................... [ 12%]
tests/test_docstrings/test_google.py ................................................................................... [ 22%]
tests/test_public_api.py .. [ 22%]
tests/test_encoders.py .. [ 22%]
tests/test_cli.py .... [ 23%]
tests/test_functions.py .......... [ 24%]
tests/test_extensions.py ............. [ 26%]
tests/test_merger.py ..... [ 26%]
tests/test_stdlib.py ........................................................................................................... [ 40%]
..................................................................................... [ 50%]
tests/test_loader.py ............................. [ 54%]
tests/test_docstrings/test_warnings.py . [ 54%]
tests/test_docstrings/test_numpy.py ...................................................................... [ 62%]
tests/test_diff.py ............................ [ 66%]
tests/test_visitor.py .................................... [ 70%]
tests/test_inheritance.py ................. [ 72%]
tests/test_finder.py ................................. [ 76%]
tests/test_mixins.py . [ 76%]
tests/test_inspector.py .................. [ 79%]
tests/test_docstrings/test_sphinx.py ................................................................. [ 87%]
tests/test_models.py .............................. [ 90%]
tests/test_expressions.py ................................................................. [ 98%]
tests/test_api.py .s...s. [ 99%]
tests/test_git.py ... [100%]
=================================================== 812 passed, 2 skipped in 19.87s ====================================================
❯ uv run python demo_bug.py
Could not load package Package(name='empty_except_bom', path=PosixPath('/mnt/c/scratch/repos/other/Griffe/empty_except_bom.py'), stubs=None)
Traceback (most recent call last):
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/agents/visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/agents/visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "/mnt/c/scratch/repos/other/Griffe/empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)
Traceback (most recent call last):
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/agents/visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/agents/visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "/mnt/c/scratch/repos/other/Griffe/empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/scratch/repos/other/Griffe/demo_bug.py", line 20, in <module>
module = loader.load(file.stem)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "/mnt/c/scratch/repos/other/Griffe/src/_griffe/loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)
❯ git switch support-utf8-bom
Switched to branch 'support-utf8-bom'
Your branch is up to date with 'origin/support-utf8-bom'.
❯ uv run pytest
========================================================= test session starts ==========================================================
platform linux -- Python 3.13.4, pytest-8.4.0, pluggy-1.6.0
Using --randomly-seed=3242253200
rootdir: /mnt/c/scratch/repos/other/Griffe
configfile: pyproject.toml
plugins: cov-6.1.1, randomly-3.16.0, xdist-3.7.0
collected 814 items
tests/test_encoders.py .. [ 0%]
tests/test_extensions.py ............. [ 1%]
tests/test_diff.py ............................ [ 5%]
tests/test_docstrings/test_sphinx.py ................................................................. [ 13%]
tests/test_inheritance.py ................. [ 15%]
tests/test_docstrings/test_numpy.py ...................................................................... [ 23%]
tests/test_public_api.py .. [ 24%]
tests/test_nodes.py .................................................................................................... [ 36%]
tests/test_api.py ..s..s. [ 37%]
tests/test_inspector.py .................. [ 39%]
tests/test_merger.py ..... [ 40%]
tests/test_docstrings/test_google.py ................................................................................... [ 50%]
tests/test_stdlib.py ........................................................................................................... [ 63%]
..................................................................................... [ 73%]
tests/test_visitor.py .................................... [ 78%]
tests/test_git.py ... [ 78%]
tests/test_cli.py .... [ 79%]
tests/test_functions.py .......... [ 80%]
tests/test_docstrings/test_warnings.py . [ 80%]
tests/test_expressions.py ................................................................. [ 88%]
tests/test_loader.py ............................. [ 92%]
tests/test_mixins.py . [ 92%]
tests/test_models.py .............................. [ 95%]
tests/test_finder.py ................................. [100%]
=================================================== 812 passed, 2 skipped in 17.32s ====================================================
❯ uv run python demo_bug.py
So I really don't know why it would fail on Ubuntu with Python 3.13, as that is what I'm testing.
Though I am flummoxed by the CI failing on the PR.
Sorry, don't be, it's failing when dependencies are resolved with the "lowest-direct" strategy. This job, as well as the ones on 3.14, are allowed to fail 🙂 (GitHub just doesn't visually indicate that well).
Python/CPython supports it
I was going to ask what this means exactly, but I suppose it's simply: Python can import modules or run scripts that are UTF-8 encoded with a BOM. Annoying that the built-in compile doesn't ignore the mark directly.
About the syntax error not triggering in the REPL under certain conditions, don't bother, I myself don't have the energy to investigate that. The issue is perfectly reproducible in the common case.
Sorry for the late reply, but I caught the flu or something.
I was going to ask what this means exactly, but I suppose it's simply: Python can import modules or run scripts that are UTF-8 encoded with a BOM. Annoying that the built-in
compiledoesn't ignore the mark directly.
Yes, that's exactly what this means. And yes, compile doesn't deal with that. I think it's because it's expecting code snippets, and thus Python strings, i.e. Unicode strings. Whereas the BOM is a file-level marker, so it needs to be stripped when reading an entire file. It wouldn't be valid anywhere else in the file. And that's exactly what encoding="utf-8-sig" does for us.
No worries, I hope you're healing nicely 🙂