yapf
yapf copied to clipboard
Grammar caching causes EOFError and race condition when used as pre-commit hook
Thanks to pre-commit#851, hooks can now be executed in parallel for each file. This feature is enabled by default.
When a large number of files need to be formatted at once, this concurrency introduces a race condition in wherein one process can conclude that the pickled grammar file does not exist and start to create it. Meanwhile, another process sees the newly created pickle file before it is done being written, concludes that it is newer than the raw unpickled grammar, and then attempts to load it. The result is the following stack trace:
Traceback (most recent call last):
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/bin/yapf", line 5, in <module>
from yapf import run_main
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/__init__.py", line 41, in <module>
from yapf.yapflib import yapf_api
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/yapflib/yapf_api.py", line 38, in <module>
from yapf.pyparser import pyparser
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/pyparser/pyparser.py", line 44, in <module>
from yapf.yapflib import format_token
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/yapflib/format_token.py", line 23, in <module>
from yapf.pytree import pytree_utils
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/pytree/pytree_utils.py", line 30, in <module>
from yapf_third_party._ylib2to3 import pygram
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 29, in <module>
python_grammar = driver.load_grammar(_GRAMMAR_FILE)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 252, in load_grammar
g.load(gp)
File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pgen2/grammar.py", line 95, in load
d = pickle.load(f)
^^^^^^^^^^^^^^
EOFError: Ran out of input
Since the process writing the file is not impacted by this error, the pickled grammar will get cached. Thus, subsequent runs will succeed. However, if one is using yapf
in a continuous integration context, this would cause a failed pipeline with potentially high probability (depending on the number of files) with no obvious recourse.
A workaround is to add require_serial
to the yapf
hook in your project's .pre-commit-config.yaml
, but this comes at the cost of losing the advantages of pre-commit#851. I believe the logic in _load_grammar
could be refactored to avoid the race condition with some more careful checks.
The problem seems to be in Grammar.dump()
in file yapf/third_party/yapf_third_party/_ylib2to3/pgen2/grammar.py
- it opens the file directly instead of creating a temporary name, writing to the file, then renaming it (this works on Unix, where rename is an atomic operation (as are open, close, linke, etc.), so it's safe if two processes do the same thing at the same time). I don't know if this technique works on Windows.
Something like this:
def dump(self, filename):
"""Dump the grammar tables to a pickle file."""
with tempfile.NamedTemporaryFile(mode='wb') as f:
pickle.dump(self.__dict__, f, pickle.HIGHEST_PROTOCOL)
os.link(f.name, filename)
Agreed, I think that would work on Unix systems. To my knowledge, there is no documented, guaranteed atomic write operation on Windows. The python-atomicwrites
library provides a best-effort atomic write operation for Windows; there may or may not be something better out there.
There's an alternative method that runs a small risk of leaving a temporary file lying around, but might work better on Windows [this code is untested; and it probably should have a try/except to deal with cleanup on error]:
with tempfile.NamedTemporaryFile(mode='wb', delete=False, delete_on_close=False) as f:
tmp_file_name = f.name
pickle.dump((self.__dict__, f, pickle.HIGHEST_PROTOCOL)
os.rename(tmp_file_name, filename)
Same problem, how to solve it?
Same problem, how to solve it?
similar issue and the way to fix it: https://github.com/google/yapf/issues/1204