osmosis icon indicating copy to clipboard operation
osmosis copied to clipboard

fix(e2e): flakiness when upgrading containers for `chainB`

Open p0mvn opened this issue 3 years ago • 7 comments

Background

e2e has recently had several refactors. The latest change #1999 has likely caused test flakiness.

Info from @czarcas7ic :

  • Failing run: https://github.com/osmosis-labs/osmosis/actions/runs/2645267194/attempts/1
  • Failing line: https://github.com/osmosis-labs/osmosis/blob/938f9bdb4ce05e178340b63c25452505ff7c6a3d/tests/e2e/configurer/upgrade.go#L250
  • It always fails when upgrading containers for chainB

Info from @p0mvn :

  • Impossible to reproduce locally, tried re-running 10 times
  • #1999 is the most likely cause

Acceptance Criteria

  • investigate and fix e2e test flakiness
  • Re-running CI 10 times does not cause an issue (try making redundant changes to trigger)

p0mvn avatar Jul 12 '22 19:07 p0mvn

Added more logs here: https://github.com/p0mvn/osmosis/pull/14

Trying to manually trigger e2e in CI multiple times to repro this

p0mvn avatar Jul 12 '22 20:07 p0mvn

I have 2 updates on this:

  1. I was not able to reproduce in CI with extra logs. Tried running 10 times on my fork: https://github.com/p0mvn/osmosis/runs/7311008352?check_suite_focus=true

  2. The first update should not be a big problem because #2040 refactors the logic for waiting for a certain height. Instead of using CLI, it now uses Tendermint RPC which should be more reliable and easier to debug. We need this refactor for the next step in state-sync so it might address 2 problems at once.

p0mvn avatar Jul 12 '22 22:07 p0mvn

Recent e2e flakiness: https://github.com/osmosis-labs/osmosis/runs/7351687619?check_suite_focus=true

https://github.com/osmosis-labs/osmosis/pull/2078

p0mvn avatar Jul 15 '22 03:07 p0mvn

Another recent instance: https://github.com/osmosis-labs/osmosis/runs/7373637913?check_suite_focus=true

This was at initialization though

czarcas7ic avatar Jul 17 '22 00:07 czarcas7ic

Another instance: https://github.com/osmosis-labs/osmosis/runs/7458951335?check_suite_focus=true

czarcas7ic avatar Jul 21 '22 22:07 czarcas7ic

Had a chat with @nikever about this. The plan is to get something going with the self-hosted runners and retain container logs to improve the debugging experience

p0mvn avatar Jul 21 '22 22:07 p0mvn

Awesome, self hosted runners are for sure a game changer

czarcas7ic avatar Jul 21 '22 22:07 czarcas7ic

Probably fixed by #2556 , am going to close this for now and if we see another instance of this happening we can reopen and look into this deeper

czarcas7ic avatar Aug 31 '22 15:08 czarcas7ic