mypy
mypy copied to clipboard
mypyc incremental failures on master
We've started seeing non deterministic failures of mypyc incremental tests in CI.
The first test failure on master was on https://github.com/python/mypy/commit/840a310f0b6c4a4bd7edfb141c8d1731ba4dd027 (succeeded on retry).
The earliest test failure I've observed was on https://github.com/python/mypy/pull/13557 when it was based on d9bdd6d96.
No known changes to Python version or dependencies. The underlying Github runner image did change, but nothing stands out in the changelog and it doesn't seem likely to cause failures.
To my knowledge, no one has been able to reproduce failures locally.
When running locally on Linux, I get deterministic failures on:
FAILED mypyc/test/test_run.py::TestRun::run-loops.test::testForIterable
FAILED mypyc/test/test_run.py::TestRun::run-generators.test::testYieldThrow
But these deterministically reproduce and are unrelated to incremental, and reproduce on master hundreds of commits ago, so seem unrelated.
In https://github.com/python/mypy/pull/13573, we see that we now encounter errors even on https://github.com/python/mypy/commit/da56c974a57688f5c7989ef3abd86cd1d1608793 , which is several days and commits before the first CI failure we saw
I wonder if this might have been caused by an update to the ubuntu image github actions uses. This is the current version:
Image: ubuntu-20.04
Version: 20220828.1
The date is suspiciously close to the first day this started happening. The first known failure was using this image. Unfortunately, it doesn't look like there's a way to downgrade the runner to validate this hypothesis.
An earlier successful build uses a different image:
Image: ubuntu-20.04
Version: 20220821.1
I looked at pip dependencies and it doesn't look like any point release of a dependency could have caused the regression.
Looking at the readmes of the different GitHub image versions, none of the changes seem particularly likely to have caused this.
Since the failures are somewhat random, it's possible that there is a race condition that has been around for some time that gets triggered now more frequently. We could try running the tests sequentially to test this hypothesis.
It looks like fudge_dir_mtimes
used in mypyc/test/test_run.py
doesn't work reliably any more in GitHub actions. We could replace it with sleep(1)
, but that would slow down tests. One option would be to only use sleep(1)
when running in GitHub actions, but that would be ugly. Maybe we can figure out why fudge_dir_mtimes
stopped working. If I remove the call to fudge_dir_mtimes
and run tests locally, they start failing with familiar-looking errors.
As a wild guess: maybe GitHub actions now somehow magically turn (some) files into symlinks? Docs for os.utime()
says it doesn't follow symlinks.
Actually I misread the docs I think, the default seems to be True
, i.e. to follow symlinks. So we can rule this out.
It looks like fudge_dir_mtimes
is always used to set the timestamps back by one second - perhaps it should go further back? A note in the docs says that st_mtime
has 2-second resolution on FAT32 - it'd be a bit surprising if the image is using a FAT32 disk, but not impossible...
Or, perhaps a call to os.sync()
is needed?
I tried moving mtime back 10 seconds but it didn't help.
Now I'm seeing the failure locally using master, on Ubuntu 20.04. I wonder if an Ubuntu upgrade broke the tests. At least this makes it easier to debug the failures.
I merged a workaround that adds a sleep(1)
on Linux, but it would be nice to figure out the root cause and fix the tests without making them slower.
Oh interesting. I'd tested on 20.04 without being able to repro. Let me play around more...