elixir-omg icon indicating copy to clipboard operation
elixir-omg copied to clipboard

Show all byzantine events in `/status.get`

Open kasima opened this issue 5 years ago • 2 comments

As a good actor in Plasma, I can see all byzantine exit events at once, So that I can challenge them easily (and make that bond money! 🤑)

From the Samrong incident, it seems that that byzantine_events array is returning a limited number of events. Once those events are challenged, the watcher requires a restart to pick up additional events.

During the incident, there were 5 unchallenged_exits. The cycle of [status.get unchallenged exits, challenge, restart watcher] happened 3 times to clear all the events.

It would be helpful to get a full list of invalid_exits and unchallenged_exits for a single call to status.get without having to restart.

kasima avatar Jun 17 '19 09:06 kasima

Expanding on the comment left in https://github.com/omisego/devops/issues/100#issuecomment-504457243.

it seems that that byzantine_events array is returning a limited number of events

For the sake of clarity - it is not limiting the number of events in any way. It always returns all the events it knows about. The problem is in the cycle described:

During the incident, there were 5 unchallenged_exits. The cycle of [status.get unchallenged exits, challenge, restart watcher] happened 3 times to clear all the events.

The cycle resulted because syncing of block halted due to the unchallenged_exit condition. In every "moment" of the cycle, full list of events was returned, according to the current "state of knowledge" of the watcher.

It is impossible to discern the invalidity of the exits, without pulling the invalidating blocks, and the latter is stalled because the watcher is in unchallenged_exit state and not pulling new blocks. The "not pulling blocks" part is the basic measure implemented to protect against corrupt ledger state and prevent user from sending/receiving money via plasma.

The cycle of status.get, challenge, restart is formally due to not following the protocol - the chain should be exited from at the first instance of unchallenged.

Next to keep in mind is the conscious decision to not allow Watcher to "get out" of a "call to mass exit" condition automatically (e.g. when the late invalid exit gets challenged after all) - to avoid false positives (discussed ~2 months ago).

So, having said that we have the following options: 1/ live with it - "this should never happen" - let's rather focus on being vigilant about invalid exits and implementing the auto-challenger quickly to minimize impact. Also note that this is testnet specific - outside of testnet we will follow protocol (exit instead of try to rescue the chain), also the exit periods are much longer 2/ loosen up the logic behind unchallenged_exits - potentially dangerous - e.g. allow to "pop-back" to validity and continue 3/ try to implement pro-modes to have (2/) but only opt in, just for our convenience.

I'm inclined towards (1/) as the clean solution. (2/) I don't like too much, (3/) would be some form compromise, but one I'd fear would end up being abused and also decrease the pressure to get the challenges right and on time, which is a needed pressure.

Oh and I'd also love to keep the logic behind unchallenged_exit condition as simple as possible, have it only as the last resort safety switch, that never gets pulled but it will work if needed.

pdobacz avatar Jun 24 '19 10:06 pdobacz

Shall we stick with (1/) on this one and close it with wontfix label?

pdobacz avatar Dec 16 '19 12:12 pdobacz