raiden
raiden copied to clipboard
Raiden event handling needs reviewing
Consider the following blockchain transactions:
- Channel is closed
- Channels is settled
If both of the blockchain events for the above transactions are fetched in the same batch, then:
- Because of
channel close
theupdate transfer
event may be emitted. - The events are processed in the order they are generated
- The handling of the
update transfer
event assumes that the channel exists when it is processed - However, because of the settle the channel could have been cleared, invalidating the assumption.
The above would result raise an unrecoverable error.
The fix for the above, validation for the ContractSend
events using the same rules as the preconditions checks is necessary. (So, for the case above, since the channel state was after close, then it should at most be a recoverable error)
Related: #6444
After some thought, there are multiple approaches that can be taken to fix this:
- Give the correct copy of the
chain_state
to the event handler. The bug describe above happens because the handler is called with the state after the whole batch is processed, while it should be given the state emitted with the event itself.- Upsides:
- This should be easy to understand, and was the original implementation. However this was unintentionally changed to improve the performance of the system.
- Downsides:
- This would remove the optimization which allows us to make a single deep copy per batch. This may or may not be a performance problem since deepcopy was "optimized".
- All events would be processed, which can result in unnecessary blockchain transactions.
- Upsides:
- As described in the initial text of the issue, the same rules used for preconditions checks should be applied to the event handler. This would mean the code has to be reviewed and the handler has to be aware that the state may have changed before the handler being called.
- Upsides:
- The single deepcopy per batch would still be allowed, so there would be no performance hit.
- Downsides:
- This is generally harder to reason about, and I believe it will be a fragile solution.
- Upsides:
- Stop handling the
ContractSend
events returned by the dispatch, instead, the code should react to changes to the state itself. This is roughly the behavior thatready_to_process_events
tries to achieve, but there it only works during startup. The idea would be to stop handling the events emitted by the dispatch, but instead to look at the node's queues to check if there are new transactions to be sent, only processing theContractSend
andSend
events that are in the queues.- Upsides:
- The flag
ready_to_process_events
would be removed, since processing events would not happen automatically anymore. This has the benefit of having only one way of doing event processing. - Transactions that are added and removed from the queue in the same batch would not be sent anymore, since the state machine already has code to cleanup the transaction queue when necessary. This would not only be more optimal, but it would also fix the bug mentioned above, since the transaction would be sent.
- The flag
- Upsides:
I think the third option is the best, and it should be done for both protocol messages and transactions.
Further discussion:
- Don't emit
ContractSend
andSendMessage
events from the state transition, only store them in theState
- Make the queues (
pending_transactions
andqueueids_to_queues
) ordered and versioned, so that changes can be queried easily - Trigger effects based on those queues
- Remaining events are informational only