proselint icon indicating copy to clipboard operation
proselint copied to clipboard

ci(pre-commit): Prevent cache time-of-use/time-of-check problems

Open Holzhaus opened this issue 1 year ago • 2 comments

Interally, proselint uses caching to speed up analysis, and the corresponding code is written in a way that makes it obvious that it is not intended to run multiple instances of proselint simulatenously.

For example, the memoize function in tools.py suffers from time-of-use vs. time-of-check problems:

if not os.path.isdir(cache_dirname):
    # ...
    # While this is running, another process may create the
    # `cache_dirname` directory.
    # ...
    os.makedirs(cache_dirname)  # <-- Crashes if `cache_dirname` exists

As the code between the check and the directory creating is quite short and does not take a lot of time to execute, it's unlikely to encounter this issue on a reasonably modern desktop computer, but I ran into this issue multiple times when running proselint via pre-commit on a relatively slow low-powered ARM device in a fresh docker container.

It would be preferable to rewrite the caching code with multiprocessing in mind and by incorporating concepts like "Easier To Ask for Forgiveness, not Permission" [1], but for now, it should suffice to disable parallel execution in pre-commit.

Holzhaus avatar Sep 03 '22 10:09 Holzhaus