hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Automatic Transaction Re-submission in Chain-Component

Open KtorZ opened this issue 2 years ago • 1 comments

Why

Transactions driving the head state-machine may not end up in a block even though they have been submitted successfully for a variety of reasons (local rollback, forks later in the network, intermittent network failure...). But, we want (still valid) transactions to end up on-chain, eventually! This is particularly important for contest which are security-sensitive.

What:

Automatically re-submitting transactions until they are including in a block should alleviate part of the problem and make sure that any valid transaction end up being inserted in the ledger. This is a simple strategy which won't handle conflicting transitions but will cope with rollbacks and intermittent propagation failures. Users only have to trigger actions once for them to (eventually) happen.

How

On submission:

  • [ ] If the transaction fails at submission, meaning that it's no longer valid from our local state standpoint, we drop it and log.
  • [ ] If it's successfully submitted, then back to square 1 and wait for next block.
    • [ ] Put the transaction in a pending set

Asynchronously, on every block tick (i.e. when observing a new block)

  • [ ] Trim transactions from the pending set when seen in the block.
  • [ ] Automatically re-submit any transaction left from the pending set

KtorZ avatar May 25 '22 08:05 KtorZ

I'd say this is pretty well-defined. Shall we make it a 💬 feature item @KtorZ? Anything to add since we discussed this also as a solution to some of our problems in handling time (ADR in #450)?

ch1bo avatar Aug 05 '22 09:08 ch1bo

To improve the UX for the fanout transaction, we should

  • keep & retry submission of transactions which are NOT YET valid
  • trip transactions which are not valid ANYMORE

ch1bo avatar Sep 07 '22 09:09 ch1bo

We recently added (#493) getUTxO to the Chain interface to avoid a race-condition which could be solved by proper re-submission as well. We should review whether the getUTxO is still needed after implementing this feature.

ch1bo avatar Sep 20 '22 08:09 ch1bo

Asynchronously, on every block tick (i.e. when observing a new block)

  • Trim transactions from the pending set when seen in the block.
  • Automatically re-submit any transaction left from the pending set

Regarding the trimming of the transaction from the pending set, would that be enough to manage transactions lost because of forks? Would it make sense to wait for a given number of blocks before trimming the transaction for more security?

Regarding the re-submit, it's not clear to me if there is some sort of pool of pending transactions in Cardano like it exists in Bitcoin. If so, what if our transaction is not yet on the next block but is still pending to being inserted into the next block? Wouldn't we have to wait a bit to avoid submitting our transaction twice eventhoug it would not be needed?

(pardon my, yet, poor understanding of Cardano protocol)

pgrange avatar Sep 27 '22 16:09 pgrange

Regarding the trimming of the transaction from the pending set, would that be enough to manage transactions lost because of forks? Would it make sense to wait for a given number of blocks before trimming the transaction for more security?

It probably won't, but it's not the purpose of this story to be resistant to rollbacks. It's merely handling the realistic chance that our node had unfriendly peers/relays which ended up not diffusing the transaction robustly into a block. For handling rollbacks we have a dirt-road solution (not crashing), where the cobblestone road is in #185.

Regarding the re-submit, it's not clear to me if there is some sort of pool of pending transactions in Cardano like it exists in Bitcoin. If so, what if our transaction is not yet on the next block but is still pending to being inserted into the next block? Wouldn't we have to wait a bit to avoid submitting our transaction twice eventhoug it would not be needed?

There is a mempool in every node, so it might be stuck in there, or anywhere between our node and the next block producing nodes. We could wait longer than trying to resubmit on every block received, but there is no danger in submitting a transaction multiple times, if we handle the error response as somewhat expected.

(pardon my, yet, poor understanding of Cardano protocol)

I'd say your understanding is already middle-class rich.

ch1bo avatar Sep 27 '22 17:09 ch1bo

Summary of convo on slack (https://input-output-rnd.slack.com/archives/CR599HMFX/p1664438601525909):

  • When a tx is accepted in the mempool of a cardano-node, it will end up being included in a block except in the folllowing cases:
    • The tx becomes invalid with respect to the current ledger state of the node -> it's dropped from the mempool (silently?)
    • The tx validity upper bound is exceeded -> dropped from the mempool
    • The tx is included in a block the node forges but this block is later dropped and superseded by another fork -> tx "disappears"
    • The node has connectivity problems -> tx stays forever in the mempool
  • If one resubmits the exact same tx to a node, then it has same TxId and if it's still in the mempool ~it's silently dropped~ resubmission will return an error
  • If one resubmits a tx that's not in the mempool anymore, this means it's either:
    • been adopted in a block so the resubmitted tx is invalid and is immediately rejected
    • been discarded for one of the reasons above
  • Our pending set of tx is exactly what the mempool is so there's probably no interest in maintaining one on our own side?
  • We should not be using node2node protocol to "resubmit" tx as this could be considered adversarial behaviour and would lead the node to blacklist us

It seems the only sensible thing to do is to wait for the tx to appear on-chain (eg. wait for the OnChainTx to be observed) for "some time". Defining the amount of time to wait seems somewhat tricky, Neil Davies suggested we could use 𝚫Q framework to compute wait time bounds (http://www.pnsol.com/public/TP-PNS-2003-09.pdf, https://iohk.io/en/research/library/papers/mind-your-outcomes-the-dqsd-paradigm-for-quality-centric-systems-development-and-its-application-to-a-blockchain-case-study/)

abailly-iohk avatar Sep 29 '22 17:09 abailly-iohk

If one resubmits the exact same tx to a node, then it has same idea and if it's still in the mempool it's silently dropped

We tried this yesterday by just submitting every transaction twice. It turns out that the second submission fails with a ledger error indicating the TxIn is unknown (already spent by the first tx). So our observation tells otherwise, the node-to-client tx submission is not indempotent and we would need to handle these errors as being expected.

Anyhow, I concur in that we might only want to observe & notify clients instead of actively re-submitting as in "normal circumstances" a locally submitted transaction (without error) will end up in a block.

ch1bo avatar Sep 30 '22 05:09 ch1bo

We close the ticket as this is not something that would make sense with our current understanding of cardano-node. We should improve feedbacks on chain events and errors to the user. See #531

pgrange avatar Oct 04 '22 12:10 pgrange

Just to clarify, this ticket was originally created in response to specifically that case:

The tx is included in a block the node forges but this block is later dropped and superseded by another fork -> tx "disappears"

The reason why a fork could happen is because of benign slot battles, or because of an actual attack conducted by one of the head participant (trying for example to drop another participant contestation by creating a fork of the chain where this contestation doesn't exist). This is a plausible attack vector which has to be acknowledged.

Discussing with @ch1bo, it's true however that in such a scenario, a local node would see this as a "rollback" event and could thus react accordingly to that rollback (seeing that the contestation is absent in the new fork). Happy, to see this ticket close provided that there is at least something done with regards to re-submission of contestation in the event of rollbacks.

KtorZ avatar Oct 12 '22 12:10 KtorZ

We realise we don't strictly need this to be secure (check https://hydra.family/head-protocol/core-concepts/behavior for a high-level overview and the paper for formal stuff) BUT from a user experience perspective, not seeing a submitted tx after a while is problematic and should be reported.

abailly-iohk avatar Oct 18 '22 12:10 abailly-iohk