swarm icon indicating copy to clipboard operation
swarm copied to clipboard

swap, swap/chain, contracts/swap: add transaction queue

Open ralph-pichler opened this issue 4 years ago • 1 comments

This PR introduces a central component, the TxScheduler, in charge of sending transactions, waiting for their result afterwards and ensuring the result was successfully processed. In the future this component is supposed to take care of more chain-related tasks.

The TxScheduler is an interface currently only implemented by the TxQueue which executes transactions in sequence (see #2006, the next one is only sent after the previous confirmed) in order to avoid most nonce-related issues (#1929 ).

The general idea behind the TxScheduler is as follows:

  • A component which wishes to send a transaction does not do so directly, instead it creates a chain.TxRequest and schedules it with the TxScheduler which returns an assigned id and takes care of the rest.
  • Every request has an associated handlerID which specifies which handler to notify of events for this request. A component should register itself as the handler for the handlerIDs it uses on startup.
  • The TxScheduler will execute the requests at some point and notify the appropriate handler. If the handler function failed, the notification does not count as delivered and will be tried again in the future. This guarantee is also preserved across restarts of the swarm client. The idea behind that is that other places in the code which need to send transaction no longer need to be concerned with issues like io errors, network problems, client restarts, etc.. They just queue the request and are guaranteed to be notified of its result at some point.
  • When scheduling the request the component can also attach extra data which is stored alongside the request data. The purpose is to provide meta-information about the request that is stored within the same atomic write.

For the TxQueue transactions are processed the following way:

  • Scheduled requests go into a queue which is persisted to disk to ensure nothing is lost on node shutdown or crashes. For this, the PersistentQueue was introduced as a helper structure.
  • The queue then processes those requests in a loop:
    • It takes a request from the queue and makes it the active request
    • It sets gas limit, gas price (if necessary) and the nonce and signs the transaction
    • It sends the transaction to the backend
    • It waits for a receipt to be available
  • If the node shuts down it will continue processing the active request the next time.
  • If anything in the process fails prior to sending, the transaction counts as cancelled, otherwise as "status unknown".
  • On IO / decode errors the queue terminates. Some of those might be recoverable after a while and future PRs might attempt to restart the queue at some point.

On the cashing out side the CashoutProcessor now accepts a CashoutResultHandler which it will notify of the cashout result. This is usually Swap. During tests this handler is overridden to keep track of cashed out cheques. This mechanism replaces the cashDone channel on the backend and therefore obsoletes the global cashCheque function variable and the setupContractTest function.

In this PR only the cashout transactions for the chequebook transaction go through this mechanism. This was done to keep the PR small. Smaller future PRs should

  • handle the error cases in the cashout processor (status unkown / cancelled)
  • move the deposit transactions to this mechanism
  • add confirmation monitoring
  • ensure the node is synced when generating transactions
  • add the ability for requests to expire (so that txs are not suddenly sent months later) OR the easier alternative of just marking all none pending requests as cancelled on startup, then the initiator has to take care of resending or not in the handler.

This PR is quite large so it might be useful to look at commits individually. The PR has been split into 3 commits. The first one is the PersistentQueue, the second one implements the actual queue and the third one integrates it with the cashout transactions.

closes #2006 closes #2005 closes #1929 closes #1634

ralph-pichler avatar Mar 03 '20 19:03 ralph-pichler

@mortelli

could you give an conceptual example of another struct that would implement the TxScheduler interface, other than TxQueue? (i want to make sure i understand the difference between these 2 in terms of responsibilities)

An alternative would be a scheduler which tracks nonce count locally and allows for parallel requests instead of queueing. Another one might be a simple mock for testing the rest of the code without running the entire queue mechanism.

why does persistentQueue have a prefix, if all entries have their own separate keys? is it the idea to have multiple persistentQueue structs in the same state.Store? is this the case already?

Yes, there are already multiple in the same store now. The request queue plus one notification queue per handler. Also this is the same state store as for swap in production and we want to avoid key collisions.

i understand the situation of having a transaction with an unknown status, but why is there a func to actually notify this? would this take place in the future, when we allow transactions to expire, or is it already happening?

This can already happen now if transactions don't confirm in time or backend.SendTransaction fails. The notification exists so the sending component can react accordingly. What it does then depends (e.g. we would never reattempt a deposit but trying a cashing transaction with unknown status again might be reasonable).

regarding future PRs: can you please explain what the node's actions would be in terms of confirmation monitoring? would this be basically issue #1633?

Not fully sure yet about that. There would at least be a notification once the confirmation number has been reached.

ralph-pichler avatar Mar 10 '20 13:03 ralph-pichler