mypy icon indicating copy to clipboard operation
mypy copied to clipboard

mypyc incremental failures on master

Open hauntsaninja opened this issue 2 years ago • 11 comments

We've started seeing non deterministic failures of mypyc incremental tests in CI.

The first test failure on master was on https://github.com/python/mypy/commit/840a310f0b6c4a4bd7edfb141c8d1731ba4dd027 (succeeded on retry).

The earliest test failure I've observed was on https://github.com/python/mypy/pull/13557 when it was based on d9bdd6d96.

No known changes to Python version or dependencies. The underlying Github runner image did change, but nothing stands out in the changelog and it doesn't seem likely to cause failures.

To my knowledge, no one has been able to reproduce failures locally.

When running locally on Linux, I get deterministic failures on:

FAILED mypyc/test/test_run.py::TestRun::run-loops.test::testForIterable
FAILED mypyc/test/test_run.py::TestRun::run-generators.test::testYieldThrow

But these deterministically reproduce and are unrelated to incremental, and reproduce on master hundreds of commits ago, so seem unrelated.

hauntsaninja avatar Aug 31 '22 23:08 hauntsaninja

In https://github.com/python/mypy/pull/13573, we see that we now encounter errors even on https://github.com/python/mypy/commit/da56c974a57688f5c7989ef3abd86cd1d1608793 , which is several days and commits before the first CI failure we saw

hauntsaninja avatar Sep 01 '22 06:09 hauntsaninja

I wonder if this might have been caused by an update to the ubuntu image github actions uses. This is the current version:

  Image: ubuntu-20.04
  Version: 20220828.1

The date is suspiciously close to the first day this started happening. The first known failure was using this image. Unfortunately, it doesn't look like there's a way to downgrade the runner to validate this hypothesis.

An earlier successful build uses a different image:

  Image: ubuntu-20.04
  Version: 20220821.1

I looked at pip dependencies and it doesn't look like any point release of a dependency could have caused the regression.

JukkaL avatar Sep 01 '22 13:09 JukkaL

Looking at the readmes of the different GitHub image versions, none of the changes seem particularly likely to have caused this.

Since the failures are somewhat random, it's possible that there is a race condition that has been around for some time that gets triggered now more frequently. We could try running the tests sequentially to test this hypothesis.

JukkaL avatar Sep 01 '22 13:09 JukkaL

It looks like fudge_dir_mtimes used in mypyc/test/test_run.py doesn't work reliably any more in GitHub actions. We could replace it with sleep(1), but that would slow down tests. One option would be to only use sleep(1) when running in GitHub actions, but that would be ugly. Maybe we can figure out why fudge_dir_mtimes stopped working. If I remove the call to fudge_dir_mtimes and run tests locally, they start failing with familiar-looking errors.

JukkaL avatar Sep 01 '22 15:09 JukkaL

As a wild guess: maybe GitHub actions now somehow magically turn (some) files into symlinks? Docs for os.utime() says it doesn't follow symlinks.

ilevkivskyi avatar Sep 01 '22 15:09 ilevkivskyi

Actually I misread the docs I think, the default seems to be True, i.e. to follow symlinks. So we can rule this out.

ilevkivskyi avatar Sep 01 '22 15:09 ilevkivskyi

It looks like fudge_dir_mtimes is always used to set the timestamps back by one second - perhaps it should go further back? A note in the docs says that st_mtime has 2-second resolution on FAT32 - it'd be a bit surprising if the image is using a FAT32 disk, but not impossible...

Or, perhaps a call to os.sync() is needed?

godlygeek avatar Sep 01 '22 18:09 godlygeek

I tried moving mtime back 10 seconds but it didn't help.

JukkaL avatar Sep 01 '22 19:09 JukkaL

Now I'm seeing the failure locally using master, on Ubuntu 20.04. I wonder if an Ubuntu upgrade broke the tests. At least this makes it easier to debug the failures.

JukkaL avatar Sep 02 '22 11:09 JukkaL

I merged a workaround that adds a sleep(1) on Linux, but it would be nice to figure out the root cause and fix the tests without making them slower.

JukkaL avatar Sep 02 '22 12:09 JukkaL

Oh interesting. I'd tested on 20.04 without being able to repro. Let me play around more...

hauntsaninja avatar Sep 02 '22 18:09 hauntsaninja