magma icon indicating copy to clipboard operation
magma copied to clipboard

fix(orc8r): Stabilize cloud tests

Open MoritzThomasHuebner opened this issue 2 years ago • 5 comments

First PR addressing #13998

Signed-off-by: Moritz Huebner [email protected]

Summary

Upon inspecting the recent cloud-workflow failures, we found the following flaky tests:

  • TestSqlConfiguratorStorage_LoadEntities: 2 failures
  • TestGetSubscriberState: 2 failures
  • TestSingletonRunSuccess: 6 failures
  • TestSingletonRunFail: 4 failures
  • TestRunBrokenIndexer: 1 failures

This PR focuses on TestSingletonRunSuccess and TestSingletonRunFail.

These changes focus on test isolation:

  • Introduced a select statement at the end of both tests so that we assure the context is cancelled before moving on to the next.
  • Added manual calls to indexer.DeregisterAllForTest(t) at the end of the tests. This is not strictly necessary to make the tests green, but it can cause issues with dangling indexers if the order of the tests is ever reversed.

These changes made TestSingletonRunFail stable:

  • Moved the creation of the reindexer thread after the registration of the indexers. This way, it is impossible that the reindexer is triggered when only one or two out of the three indexers are registered.

These changes made TestSingletonRunSuccess stable:

  • The issue is that the TestHookReindexDone is called even if there is an error. Unexpected errors can be triggered in the remote indexer getIndexerClient.
  • Added a TestHookReindexFailure, that we use to keep track of reindexing job errors.
  • Safely pull all elements out of the channel ch, if there was a failure.

Test Plan

The failures for TestSingletonRunSuccess I was able to reproduce locally by manually adding an error that would trigger a fixed number of times in the remote indexer getIndexerClient. It was not possible to recreate this issue on the CI, since copy/pasting the test did not appear to increase the likelihood of failure over a single run.

The test TestHookReindexFailure can be reproduced by re-running many times on the CI. In this run, we reiterate over the both tests 51 times and they were always green. https://github.com/magma/magma/actions/runs/3187834050

Additional Information

  • [ ] This change is backwards-breaking

MoritzThomasHuebner avatar Oct 05 '22 01:10 MoritzThomasHuebner

Thanks for opening a PR! :100:

A couple initial guidelines

Howto

  • Reviews. The "Reviewers" listed for this PR are the Magma maintainers who will shepherd it.
  • Checks. All required CI checks must pass before merge.
  • Merge. Once approved and passing CI checks, use the ready2merge label to indicate the maintainers can merge your PR.

More info

Please take a moment to read through the Magma project's

If this is your first Magma PR, also consider reading

github-actions[bot] avatar Oct 05 '22 01:10 github-actions[bot]

cloud-workflow

1 135 tests   1 135 :heavy_check_mark:  2m 25s :stopwatch:    365 suites         0 :zzz:        7 files           0 :x:

Results for commit 55750d9f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 05 '22 02:10 github-actions[bot]

feg-workflow

    2 files  203 suites   40s :stopwatch: 374 tests 374 :heavy_check_mark: 0 :zzz: 0 :x: 388 runs  388 :heavy_check_mark: 0 :zzz: 0 :x:

Results for commit 55750d9f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 05 '22 07:10 github-actions[bot]

dp-workflow

14 tests   14 :heavy_check_mark:  2m 16s :stopwatch:   1 suites    0 :zzz:   1 files      0 :x:

Results for commit 55750d9f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 05 '22 07:10 github-actions[bot]

agw-workflow

615 tests   611 :heavy_check_mark:  3m 48s :stopwatch:     2 suites      4 :zzz:     2 files        0 :x:

Results for commit 55750d9f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 05 '22 08:10 github-actions[bot]