cabal icon indicating copy to clipboard operation
cabal copied to clipboard

PackageTests/NewUpdate: fix skipping flaky tests

Open bacchanalia opened this issue 1 year ago • 9 comments

RejectFutureIndexStates and UpdateIndexState are marked "skip", but it's under withRemoteRepo, which causes flakiness before skip is called.

bacchanalia avatar May 19 '24 22:05 bacchanalia

It was a problem with the older macos GitHub image, which they appear to have fixed last night (my time). This doesn't prove it won't come back at some point in the future.

geekosaur avatar May 19 '24 23:05 geekosaur

@ulysses4ever I split the body out to make the diff smaller.

bacchanalia avatar May 19 '24 23:05 bacchanalia

@geekosaur I saw another failure when I tried to rerun today, but it might be reusing the old image because I was rerunning on an existing PR? https://github.com/haskell/cabal/actions/runs/9146300664/job/25155311727?pr=10032

bacchanalia avatar May 19 '24 23:05 bacchanalia

That's the old image (macos-13), which is used for older ghcs which don't have Apple ARM support (anything before 8.10.5). macos-latest/macos-14 is the new ARM image.

geekosaur avatar May 19 '24 23:05 geekosaur

This test has always had sporadic network failures on all platforms. For around a day and a half, though, it was very consistently failing for all jobs using the macos-13 image.

geekosaur avatar May 19 '24 23:05 geekosaur

@bacchanalia: thanks a lot for spotting this and the fix.

It seemed to me that it's only failing on MacOS lately. Do you agree? In that case, can we skip it only on that platform, perhaps?

How about this? It's true we haven't seen this fail on non-macos for a long time, so maybe we can skip a bit less?

Mikolaj avatar May 20 '24 09:05 Mikolaj

I agree it'd be much better to have it tested at least on one platform. The failure doesn't have to do with the Cabal operation but with the cabal-testsuite setup (it was investigated in the past). So, having it on more than zero platforms gives us some confidence that the tested behavior isn't regressing.

ulysses4ever avatar May 20 '24 13:05 ulysses4ever

From what @geekosaur said I was under the impression that it was still flaky on all platforms, just less than MacOS. If that's not the case, I can change the skip.

bacchanalia avatar May 20 '24 16:05 bacchanalia

From what @geekosaur said I was under the impression that it was still flaky on all platforms, just less than MacOS. If that's not the case, I can change the skip.

That's correct, but it wasn't flaky except for macos for multiple months now, so maybe we are lucky and it's been fixed [edit: except on macos, where I had this failure this very morning]. :)

Mikolaj avatar May 20 '24 19:05 Mikolaj

@geekosaur: I think @bacchanalia may be waiting for your take on that. Would you mind offering a recommendation?

Mikolaj avatar May 22 '24 13:05 Mikolaj

Well, if it's actually an issue with the image then the right action is to report it to GitHub, but adding a skip for now would avoid a lot of CI pain.

geekosaur avatar May 22 '24 14:05 geekosaur

I got a linux test failure, so I'm going to revert to skipping on all platforms.

bacchanalia avatar May 23 '24 03:05 bacchanalia

If nobody objects, I'm going to expedite the merge to shorten the period we are getting the spurious CI errors.

Mikolaj avatar May 23 '24 11:05 Mikolaj

Let's do it!

ulysses4ever avatar May 23 '24 11:05 ulysses4ever

@mergify backport 3.12

ulysses4ever avatar Jun 07 '24 22:06 ulysses4ever

backport 3.12

✅ Backports have been created

  • Backport to branch 3.12 not needed, change already in branch 3.12

mergify[bot] avatar Jun 07 '24 22:06 mergify[bot]