frequency
frequency copied to clipboard
Forks can cause transactions to get stuck
The transaction pool is sometimes reject transactions when building a fork.
Example CI run with the issue: https://github.com/LibertyDSNP/frequency/actions/runs/8527529557/job/23359157130
Details of this are in the references, but here is a quote from https://hackmd.io/@_FY3-hvwQZ6cX_4n8zYUNA/HJqUWj4_s
The ready_at method operates on block height. It is not synced with maintenance process in terms of forks. This results in two major issues:
when building new blocks on top of block which are not best or finalized, the invalid::stale error will occur. This is because the transaction pruning was not executed on block import,
A -- B0[u0] -- C0[..] // If the maintenance was *not* triggered for B0, // ready_at will provide u0 when building C0 , // (which is stale from B0 perspective)
when constructing new blocks on an alternative fork, the invalid::future error might arise. This occurs when blocks on the alternative fork lack transactions that serve as prerequisites for transactions present in the ready pool. As the maintained for contains these prerequisite transactions the ready set would comprise transactions that are considered future on the alternative fork. See the figure below:
B1[u0,u1]--C1[u2] //u3 is ready, after maintenance was triggered for C1 / A \ B0[t0,t1]--C0 //when building C0 ready_at will provide u3, //which is future from B0 perspective
How to reproduce on Testnet Paseo
- Run the e2e tests
- See that tests fail
- See that the rpc node has pending extrinsics
(Note I have cleared out all pending extrinsics from prior test runs)
How to reproduce locally
Use branch: ci-update/e2e-paseo
Might not be possible without a lot of work, but this appears to work:
- Download/Build (mac) Polkadot 1.8+
- Startup Paseo Local Alice and Bob using the Paseo Local spec in
resources
- Register per normal make command
- Start Frequency Alice and Bob
- Onboard per normal make command
- Run e2e tests
- See failures
Suggested Solution
Wrap the transaction pool in a custom implementation that handles the issue in the short term.
- Code Location: https://github.com/LibertyDSNP/frequency/blob/fb4b10d2865f55a8c636f04f3165d07b17048ba2/node/service/src/service.rs#L168-L174
- Basic Pool Docs: https://paritytech.github.io/polkadot-sdk/master/sc_transaction_pool/struct.BasicPool.html
- Ideas
- Stupid simple: Don't do anything inside
remove_invalid
so that the transaction will remain in the tx pool.- Issue: This means all invalid transactions are left in the mempool
- Reason Lookup: Inside of
remove_invalid
somehow to a check to see why this tx was marked invalid. Take that and see if it is something that could be a fork issue. - Time Lookup: Inside of
remove_invalid
somehow to a check to see what the mortality of this bad tx is. Only mark it invalid if it is outside of some expected fork bound.
- Stupid simple: Don't do anything inside
Long-term Solution
Upgrade to a version of Polkadot-SDK with the issue resolved and remove any short-term solutions.
References
- Best detailed summary of the issue and substrate level fixes: https://hackmd.io/@_FY3-hvwQZ6cX_4n8zYUNA/HJqUWj4_s
- Polkadot-SDK issue: https://github.com/paritytech/polkadot-sdk/issues/1202
- StackExchange post of someone dealing with the same issue: https://substrate.stackexchange.com/questions/11225/pending-extrinsics-jammed-how-to-ensure-re-broadcast