lnd icon indicating copy to clipboard operation
lnd copied to clipboard

itest: fix previously known test flakes

Open yyforyongyu opened this issue 3 years ago • 0 comments

This PR replaces #5940 and fixes all the itest failures we’ve seen in the past. The goal is to put our itest builds in a state where a simple typo fix PR won’t fail them. As the focus is on the tests, certain workarounds have been used such as time.Sleep, and several TODOs have been noted, so that we can spend time investigating what’s going on in lnd itself later.

This final PR is built upon,

  • #6759
  • #6776
  • #6822
  • #6823
  • #6824

Please check the readme to gain an overview of the structure of the new test framework.

It also depends on,

  • #6868
  • #7066
  • #7095

Issues Identified

Multiple issues have been found and they need to be fixed in follow-up PRs.

Within itest, the most common issue is that a subsystem cannot catch up because we are mining blocks too fast, which is mitigated by slowing down mining blocks and asserting standby nodes being synced. This also can be fixed from lnd side, where we force our subsystems to stay on the same block height(more on this later in a new issue).

Within lnd, the most common issue is peer connection randomly dropped, and I have a PR to address it.

Follow up issues,

  • #6788
  • Can’t close channel due to active HTLCs, possibly be fixed by #6760?
  • Invoices are settled while the commitment dance is not finished
  • Peer not online when closing channel, got error, unable to gracefully close channel while peer is offline (try force closing it instead): channel link not found
  • Different speeds when syncing blocks in lnd’s subsystems
  • Assess the wait.NoError usage in itest to uncover potential issues
  • Add a new CI to run itest with a slow miner to simulate real life block production rate

TODO:

  • [ ] Create new issues for the above

yyforyongyu avatar Aug 12 '22 10:08 yyforyongyu