elixir-omg icon indicating copy to clipboard operation
elixir-omg copied to clipboard

Split invalidly processed exit from unchallenged exit events

Open pdobacz opened this issue 5 years ago • 2 comments

Coming from #1275, which in the end opted to take a different approach to solve.

separate the conflated unchallenged_exit from invalidly_processed_exit and have things called by their name. The conflating of those events is too much of a stop-gap solution to continue with.

Then things get easier - when an exit processes invalidly (which was Gerhard's case) we'll get a dedicated event invalidly_processed_exit.

And as a consequence, we'll have more freedom in setting exit_processor_sla_margin, which drives unchallenged_exit. This is because we'll then set it to something larger than min_exit_period, which matters in cases where min_exit_period is really small.

This has its own merit, since conflating of the notions of an unchallenged exit and of an exit that processed invalidly under a single report (unchallenged_exit) and logic is messy and hard to understand by someone who's looking.

Here is a proposed solution:

  1. [x] Cleanup and fix old invalid_exit tests, #1344
  2. [x] Make ExitProcessor take the min(synced_heights_of_services) from RootChainCoordinator instead of "Eth.current_height" to determine "now" and lateness (draft in branch pdobacz/1318_invalidly_processed)   - WHY?: otherwise one would get :unchallenged_exit events for exits challenged but challenges-not-yet-recognized. This is because we'd see the exit as very late, because we'd look at the "Eth.current_height" which is most likely after the challenge height. min(synced_heights_of_services) is smaller, if ExitProcessor is still syncing to recognize the challenge, so the exit is not perceived as late. Previously "inactive on recognition" handled that, now it's going away   - WHY 2?: it is also simpler, since we're only looking at the synced height of the services, instead of "jumping the gun" with looking at "Eth.current_height"
  3. [x] Cabbagize invalid_exit tests, #1344
  4. [ ] Add Standard exit cabbage tests, where a Watcher is resynced (from clear state).
  • this is to catch up with testing of those edge cases, where a test passes when the Watcher is always "up-to-date", but fails when it resyncs. This will give us confidence required to meddle with the behavior of unchallenged_exit (and coverage in general)
  • some 1st draft pitch of how could this look like is in the branch: pdobacz/1st_draft_of_Watcher_resync_test
  1. [ ] Make ExitProcessor interpret :root_chain_height and min(synced_heights_of_services) from RootChainCoordinator to detect being in syncing and halt late exit reports   - WHY?: otherwise there's the situation where an exit was invalid, then late and then challenged. The expected behavior is to allow the Watcher to sync, which is what "inactive on recognition" (see below) handled before. If "inactive on recognition" is gone then we need to handle this situation
  • However, this might be puntable. Due to the way that the challenge events are processed (eagerly, i.e. not waiting for child blocks to catch up), those "missing" challenges will be eventually consumed, and the unchallenged_exit lifted. Worst case is it will require Watcher restarts.
  1. [ ] Tidy and unit test the min(synced_heights_of_services) behavior from (2.) above
  • right now the code that does it in the branch pdobacz/1318_invalidly_processed just lives in the GenServer code. It could migrate to RootChainCoordinator.Core or ExitProcessor.Core, since it is pure logic.
  1. [ ] Throw away the "inactive on recognition" detection and the getting of exit structs for SEs   - WHY?: (same applies to IFEs) if we're going to accurately track the status of SE/IFE then we need to not have the superfluous "meta-status" of "inactive on recognition"
  • the "inactive on recognition" is all that happens after an exit is recognized inactive on new_exits: https://github.com/omisego/elixir-omg/blob/master/apps/omg_watcher/lib/omg_watcher/exit_processor/exit_info.ex#L135
  • as a consequence it will make us not require getting the exit status by an RPC call in https://github.com/omisego/elixir-omg/blob/master/apps/omg_watcher/lib/omg_watcher/exit_processor.ex#L246
  1. [ ] implement proper exit statuses for SE, taking invalidly_processed_exit approach

Proper exit statuses

The following could be the states with which exit's lifecycle is modeled:

Standard exits

  1. active
  2. challenged
  3. processed
  4. invalidly_processed

So the lifecycle can be 1 => 2, 1 => 3 or 1 => 4.

You'll notice it's lacking invalid, since validity is discovered by confronting the exit with other data. However, an alternative model with an explicit invalid as a state is possible, but a larger rewrite/rethinking is necessary.

By consequence unchallenged isn't there. unchallenged is a superposition of invalid and being older than sla_margin, where both are discovered by confronting the exit with other data.

IFE

(notice that piggybacks are kept separate)

  1. canonical
  2. non-canonical
  3. processed
  4. invalidly_processed_false_canonical
  5. invalidly_processed_false_non_canonical

State can first alternate between 1 and 2 (starts at 1 always). Then it ends up in 3 if valid, and 1 => 4 or 2 => 5 if invalid.

Similar comments as with SEs apply

Piggybacks

  1. missing (could be also not represented, but currently we kinda have an explicit missing, since we operate on an IFE-level)
  2. active
  3. challenged
  4. processed
  5. invalidly_processed

Note that safe for the 0, this behaves just like an SE.

Similar comments as with SEs apply.

How to figure out when an IFE finalizes with invalid canonicity (to invalidly_processed_false_canonical or invalidly_processed_false_non_canonical)?

Figuring that out is necessary, if we want to put an IFE in a invalidly_processed state. It is more difficult than SEs and PBs, because you might be exiting UTXOs that exist, but the processing is still bad.

Todo:

  • [ ] optimize stuff in the ExitProcessor.Core by only looking at known tx that spend an interesting input, don't group_by all txs from all blocks   - WHY?: right now, the process of checking of the canonicity of an IFE tx is not optimal (just enough for the rare calls to endpoints which fetch the validity and the proof data). If the Watcher needed to figure that out quickly and for many IFEs when consuming finalization events, it'd timeout a lot. After that is optimized, it will be reasonable to do a similar check with just one IFE when finalizing.
  • [ ] then on IFE processing we run this check for this IFE and we know if it has valid canonicity

pdobacz avatar Feb 14 '20 09:02 pdobacz

Note that #671 will benefit from this too. Those two issues must be done in coordination.

Also, I'm assigning myself here as I've done some thinking on how to solve. If I don't get to do this I'll post what I've got here.

pdobacz avatar Feb 14 '20 09:02 pdobacz

@pdobacz pdobacz closed this in #1372 4 days ago

no I didn't, github-fail

pdobacz avatar Mar 09 '20 16:03 pdobacz