optimism icon indicating copy to clipboard operation
optimism copied to clipboard

Improve reliability of acceptance tests

Open teddyknox opened this issue 7 months ago • 4 comments

Acceptance criteria would be that instead of acceptance tests being K-shot 100%, they all become consistently 1-shot 100%.


FINDINGS:

  • op-acceptance-tests/tests/interop/reorgs pkg passes with -count=5, so not actionable at the moment. (test took 2005.531s)

TODO:

  • [ ] fix flaky pkg-level tests at op-acceptance-tests/tests/interop/sync/multisupervisor_interop
  • [ ] fix flaky TestL2CLAheadOfSupervisor
  • [x] fix flaky TestUnsafeChainUnknownToL2CL -- https://github.com/ethereum-optimism/optimism/pull/16394
  • [x] fix flaky pkg-level tests at op-acceptance-tests/tests/interop/seqwindow -- TestSequencingWindowExpiry -- fixed at https://github.com/ethereum-optimism/optimism/pull/16393
  • [x] fix flaky pkg-level tests at op-acceptance-tests/tests/interop/reorgs -- https://github.com/ethereum-optimism/optimism/pull/16415

09/06/2025, 05:28
TOP10 flakiest acceptance tests (by #flakes):

TestSequencingWindowExpiry (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/seqwindow) [104 flakes]
TestL2CLAheadOfSupervisor (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/sync/multisupervisor_interop) [97 flakes]
TestUnsafeChainUnknownToL2CL (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/sync/multisupervisor_interop) [29 flakes]
TestReorgInvalidExecMsgs/invalid_chain_id (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/reorgs) [21 flakes]
TestReorgInvalidExecMsgs/invalid_block_number (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/reorgs) [19 flakes]
TestUnsafeChainUnknownToL2CL (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/sync/redundant_interop) [19 flakes]
TestL2CLSyncP2P (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/sync/multisupervisor_interop) [15 flakes]
TestReorgUnsafeHead (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/reorgs) [14 flakes]
TestReorgInvalidExecMsgs/invalid_log_index (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/reorgs) [10 flakes]
TestLoad (github.com/ethereum-optimism/optimism/op-acceptance-tests/tests/interop/loadtest) [9 flakes]

teddyknox avatar Jun 11 '25 13:06 teddyknox

Note that even if we measure the flakiness of single tests using

go test -v -count=1337 -run ^TestName$

This may not match the CI because some tests share the same environment, initialized by

func TestMain(m *testing.M) {
...

So for example sync tests located at multisupervisor_interop may interfere each other, boosting flakiness.

pcw109550 avatar Jun 11 '25 14:06 pcw109550

Good point @pcw109550, we should measure flakiness by package rather than by test.

teddyknox avatar Jun 11 '25 14:06 teddyknox

TestL2CLAheadOfSupervisor passes with -count=5

The reorg package passes with -count=5.

Will continue to review the tests, but hopefully we also catch some useful logs from CircleCI.

In any case I think the same test with -count=5 is also useful indicator, as it will reuse the environment across runs, although I agree per-package runs increase interference.


Package op-acceptance-tests/tests/interop/sync/multisupervisor_interop seems to be flaky when all tests are run within the package, so I will be looking into it (TestUnsafeChainUnknownToL2CL and TestL2CLAheadOfSupervisor)

nonsense avatar Jun 11 '25 15:06 nonsense

Also see the new "Flakiness Report", FYI https://github.com/ethereum-optimism/optimism/pull/16411

Note that we also include the "Job Name" here, which tells us which backend was used and may give us more clues as to where the flakyness arises.

Image

scharissis avatar Jun 13 '25 06:06 scharissis

Closing this as we improved tests considerable. We should continuously monitor CI and improve it as well as tests, and make sure we don't introduce too many flaky tests.

nonsense avatar Jul 09 '25 13:07 nonsense