hydra
hydra copied to clipboard
Close transaction dropped from cardano-node
Context & versions
At least 0.12.0
Steps to reproduce
- Open a head on preprod
- Submit the
Closewebsocket command - See a transaction added to the
cardano-nodemempool - Sometimes the transaction gets removed (upon seeing the next block) without it being included.
Actual behavior
The head is not getting closed and the Cardano network just dropped our transaction for this. No user feedback is given.
Expected behavior
The Cardano network to not drop our transaction. Or at least the hydra client is made aware of this (after some time).
Hypothesis
The transaction is dropped due to the invalidAfter validity range exceeded on the closeTx.
Some grooming notes:
- Explore if there is a way of detecting a tx dropped from a mempool (or current tx state)
- When the txs get dropped is there any log output?
- Is there any endpoint that we can query to know the status of a tx?
- Is there any configuration for cardano-node that controls when items get dropped from a mempool? Mempool size? Number of elements?
Note: There is:
`cardano-cli query tx-mempool --mainnet info/next-tx/tx-exists myTxId`
- Potentially we could query to detect when a tx is in the mempool and when it get's dropped so we can re-send it.
- How would we test this?
We can observe the Mempool using a specialised mini-protocol we could implement of client for, but this is somewhat involved. We could do something simpler using timeouts in the HeadLogic: When you request a Close, use a Wait to have an upper bound on how long you're waiting for observing the OnCloseTx from the chain?
When you request a Close, use a Wait to have an upper bound on how long you're waiting for observing the OnCloseTx from the chain?
You mean some retrying logic in the HeadLogic?
I think we should just adjust the upper validity range to be something more compatible with the network. This is the code which determines that upper bound: https://github.com/input-output-hk/hydra/blob/11dc6aeb68b909e070fc6eb366b4734e090b62c1/hydra-node/src/Hydra/Chain/Direct/Handlers.hs#L354-L358 Looking at that, I wonder whether this is really the issue we encounter? 200 seconds is long enough for mainnet. But then again, the contestation period is configurable and the default of 60 seconds might not be long enough (if a block is not produced within 1 minute?) https://github.com/input-output-hk/hydra/blob/11dc6aeb68b909e070fc6eb366b4734e090b62c1/hydra-node/src/Hydra/Options.hs#L605-L606
We should validate whether it's due to upper validity bounds first.
It would be great to keep track of the block hash/height when this happens in order to validate the invalid upper bound hypothesis. In the meantime, I would like to suggest we address this issue by documenting this possible behaviour and let client applications (eg. hydra-tui or any frontend apps interacting with a hydra-node) decide what to do based on their perception of time.
I think this appeared in this smoke test: https://github.com/input-output-hk/hydra/actions/runs/7088575057
Happened to me today while closing a head with team: Tx got successfully posted to the cardano-node but it was later on removed from mempool. There's no trace in the node telling why this happened.
The Cardano After Dark team encountered this issue this week, FYI
I think I have understood that slot battles can lead to mempool transactions being included in a block that gets reverted, and then is not automatically re-added. I'm curious if the same condition could occur on other head transactions.
It makes me think the node could benefit from a generational mempool, where transactions it believes are in a block are moved to the 'tentatively confirmed' pool {tx, blockId} or similar, and then if the block gets undone, any of those txs could be moved back to the regular mempool. With a tidying task to keep the old generations size-constrained. Of course that would be a node thing, not a Hydra thing...
As for the Hydra agent, it could benefit by keeping track of the expected Closing utxo and continue polling every ~10-15s for its presence for perhaps 2-3 minutes (even if it gets an early indication of presence).
Would there be a mechanism available for a client to get the Close tx CBOR from Hydra server? This way a third-party agent with its own poll/retry capabilities ✋ could take on the retries.
Would there be a mechanism available for a client to get the Close tx CBOR from Hydra server?
Not right now. While that would not be too hard to add, a third-party agent could be poll/retrying with the current API. Just send Close commands until the head is closed - nothing bad can happen from it, you would just see errors if a close is already submitted and pending for inclusion on the block chain.