elixir-omg
elixir-omg copied to clipboard
Split invalidly processed exit from unchallenged exit events
Coming from #1275, which in the end opted to take a different approach to solve.
separate the conflated unchallenged_exit from invalidly_processed_exit and have things called by their name. The conflating of those events is too much of a stop-gap solution to continue with.
Then things get easier - when an exit processes invalidly (which was Gerhard's case) we'll get a dedicated event invalidly_processed_exit.
And as a consequence, we'll have more freedom in setting exit_processor_sla_margin, which drives unchallenged_exit. This is because we'll then set it to something larger than min_exit_period, which matters in cases where min_exit_period is really small.
This has its own merit, since conflating of the notions of an unchallenged exit and of an exit that processed invalidly under a single report (unchallenged_exit
) and logic is messy and hard to understand by someone who's looking.
Here is a proposed solution:
- [x] Cleanup and fix old
invalid_exit
tests, #1344 - [x] Make
ExitProcessor
take themin(synced_heights_of_services)
fromRootChainCoordinator
instead of "Eth.current_height
" to determine "now" and lateness (draft in branchpdobacz/1318_invalidly_processed
) - WHY?: otherwise one would get:unchallenged_exit
events for exits challenged but challenges-not-yet-recognized. This is because we'd see the exit as very late, because we'd look at the "Eth.current_height
" which is most likely after the challenge height.min(synced_heights_of_services)
is smaller, ifExitProcessor
is still syncing to recognize the challenge, so the exit is not perceived as late. Previously "inactive on recognition" handled that, now it's going away - WHY 2?: it is also simpler, since we're only looking at the synced height of the services, instead of "jumping the gun" with looking at "Eth.current_height
" - [x] Cabbagize
invalid_exit
tests, #1344 - [ ] Add Standard exit cabbage tests, where a Watcher is resynced (from clear state).
- this is to catch up with testing of those edge cases, where a test passes when the Watcher is always "up-to-date", but fails when it resyncs. This will give us confidence required to meddle with the behavior of
unchallenged_exit
(and coverage in general) - some 1st draft pitch of how could this look like is in the branch:
pdobacz/1st_draft_of_Watcher_resync_test
- [ ] Make
ExitProcessor
interpret:root_chain_height
andmin(synced_heights_of_services)
fromRootChainCoordinator
to detect being in syncing and halt late exit reports - WHY?: otherwise there's the situation where an exit was invalid, then late and then challenged. The expected behavior is to allow the Watcher to sync, which is what "inactive on recognition" (see below) handled before. If "inactive on recognition" is gone then we need to handle this situation
- However, this might be puntable. Due to the way that the challenge events are processed (eagerly, i.e. not waiting for child blocks to catch up), those "missing" challenges will be eventually consumed, and the
unchallenged_exit
lifted. Worst case is it will require Watcher restarts.
- [ ] Tidy and unit test the
min(synced_heights_of_services)
behavior from (2.) above
- right now the code that does it in the branch
pdobacz/1318_invalidly_processed
just lives in theGenServer
code. It could migrate toRootChainCoordinator.Core
orExitProcessor.Core
, since it is pure logic.
- [ ] Throw away the "inactive on recognition" detection and the getting of exit structs for SEs - WHY?: (same applies to IFEs) if we're going to accurately track the status of SE/IFE then we need to not have the superfluous "meta-status" of "inactive on recognition"
- the "inactive on recognition" is all that happens after an exit is recognized inactive on
new_exits
: https://github.com/omisego/elixir-omg/blob/master/apps/omg_watcher/lib/omg_watcher/exit_processor/exit_info.ex#L135 - as a consequence it will make us not require getting the exit status by an RPC call in https://github.com/omisego/elixir-omg/blob/master/apps/omg_watcher/lib/omg_watcher/exit_processor.ex#L246
- [ ] implement proper exit statuses for SE, taking
invalidly_processed_exit
approach
Proper exit statuses
The following could be the states with which exit's lifecycle is modeled:
Standard exits
-
active
-
challenged
-
processed
-
invalidly_processed
So the lifecycle can be 1 => 2
, 1 => 3
or 1 => 4
.
You'll notice it's lacking invalid
, since validity is discovered by confronting the exit with other data.
However, an alternative model with an explicit invalid
as a state is possible, but a larger rewrite/rethinking is necessary.
By consequence unchallenged
isn't there. unchallenged
is a superposition of invalid
and being older than sla_margin
, where both are discovered by confronting the exit with other data.
IFE
(notice that piggybacks are kept separate)
-
canonical
-
non-canonical
-
processed
-
invalidly_processed_false_canonical
-
invalidly_processed_false_non_canonical
State can first alternate between 1
and 2
(starts at 1
always). Then it ends up in 3
if valid, and 1 => 4
or 2 => 5
if invalid.
Similar comments as with SEs apply
Piggybacks
-
missing
(could be also not represented, but currently we kinda have an explicitmissing
, since we operate on an IFE-level) -
active
-
challenged
-
processed
-
invalidly_processed
Note that safe for the 0
, this behaves just like an SE.
Similar comments as with SEs apply.
How to figure out when an IFE finalizes with invalid canonicity (to invalidly_processed_false_canonical
or invalidly_processed_false_non_canonical
)?
Figuring that out is necessary, if we want to put an IFE in a invalidly_processed
state. It is more difficult than SEs and PBs, because you might be exiting UTXOs that exist, but the processing is still bad.
Todo:
- [ ] optimize stuff in the
ExitProcessor.Core
by only looking at known tx that spend an interesting input, don'tgroup_by
all txs from all blocks - WHY?: right now, the process of checking of the canonicity of an IFE tx is not optimal (just enough for the rare calls to endpoints which fetch the validity and the proof data). If the Watcher needed to figure that out quickly and for many IFEs when consuming finalization events, it'd timeout a lot. After that is optimized, it will be reasonable to do a similar check with just one IFE when finalizing. - [ ] then on IFE processing we run this check for this IFE and we know if it has valid canonicity
Note that #671 will benefit from this too. Those two issues must be done in coordination.
Also, I'm assigning myself here as I've done some thinking on how to solve. If I don't get to do this I'll post what I've got here.
@pdobacz pdobacz closed this in #1372 4 days ago
no I didn't, github-fail