oasis-core
oasis-core copied to clipboard
The e2e test cases are close to unmaintainable
tldr; our e2e tests are bad and we should feel bad
Every time I need to add functionality or debug anything that involves go/oasis-test-runner or the byzantine node, things end up taking way longer than they should. As far as I can tell this is attributable to a few reasons.
-
oasis-test-runnerand our test harness code has organically grown a mountain of overcomplicated/duplicated functionality and kludges that makes maintenance a total nightmare. -
Most of our test cases are written with a lot of assumptions about how the system operates and takes shortcuts that make them exceedingly fragile to change (eg: assumptions about how timekeeping works, that I'm trying to fix).
-
The byzantine node is a gigantic kludge that also makes a lot of assumptions about how the system operates, with numerous nasty hacks that should have never been merged in the first place (in particular the old method of ensuring that the node is elected in the right spot is awful), and from a high level abstraction/code quality standpoint leaves much to be desired.
Admitedly, I am partly to blame for writing oasis-test-runner and some of the test cases to begin with, but from what I remember (in my biased view) my initial import was nowhere near this nightmarish.
Just so I hopefully remember the next time this happens before I spend a few hours trying to figure out why a entirely unrelated change suddenly starts making e2e/runtime/txsource-multi-short, if the failures appear to be gRPC related, it is the test's fault, and not mine. Next time I will sit there smashing retry repeatedly.