polkadot-sdk icon indicating copy to clipboard operation
polkadot-sdk copied to clipboard

fork-aware transaction pool added

Open michalkucharczyk opened this issue 1 year ago • 14 comments

Fork-Aware Transaction Pool Implementation

This PR introduces a fork-aware transaction pool (fatxpool) enhancing transaction management by maintaining the valid state of txpool for different forks.

High-level overview

The high level overview was added to sc_transaction_pool::fork_aware_txpool module. Use:

cargo  doc --document-private-items -p sc-transaction-pool --open

to build the doc. It should give a good overview and nice entry point into the new pool's mechanics.

Quick overview (documentation excerpt)

View

For every fork, a view is created. The view is a persisted state of the transaction pool computed and updated at the tip of the fork. The view is built around the existing ValidatedPool structure.

A view is created on every new best block notification. To create a view, one of the existing views is chosen and cloned.

When the chain progresses, the view is kept in the cache (retracted_views) to allow building blocks upon intermediary blocks in the fork.

The views are deleted on finalization: views lower than the finalized block are removed.

The views are updated with the transactions from the mempool—all transactions are sent to the newly created views.
A maintain process is also executed for the newly created views—basically resubmitting and pruning transactions from the appropriate tree route.

View store

View store is the helper structure that acts as a container for all the views. It provides some convenient methods.

Submitting transactions

Every transaction is submitted to every view at the tips of the forks. Retracted views are not updated. Every transaction also goes into the mempool.

Internal mempool

Shortly, the main purpose of an internal mempool is to prevent a transaction from being lost. That could happen when a transaction is invalid on one fork and could be valid on another. It also allows the txpool to accept transactions when no blocks have been reported yet.

The mempool removes its transactions when they get finalized. Transactions are also periodically verified on every finalized event and removed from the mempool if no longer valid.

Events

Transaction events from multiple views are merged and filtered to avoid duplicated events.
Ready / Future / Inblock events are originated in the Views and are de-duplicated and forwarded to external listeners.
Finalized events are originated in fork-aware-txpool logic. Invalid events requires special care and can be originated in both view and fork-aware-txpool logic.

Light maintain

Sometime transaction pool does not have enough time to prepare fully maintained view with all retracted transactions being revalidated. To avoid providing empty ready transaction set to block builder (what would result in empty block) the light maintain was implemented. It simply removes the imported transactions from ready iterator.

Revalidation

Revalidation is performed for every view. The revalidation process is started after a trigger is executed. The revalidation work is terminated just after a new best block / finalized event is notified to the transaction pool.
The revalidation result is applied to the newly created view which is built upon the revalidated view.

Additionally, parts of the mempool are also revalidated to make sure that no transactions are stuck in the mempool.

Logs

The most important log allowing to understand the state of the txpool is:

              maintain: txs:(0, 92) views:[2;[(327, 76, 0), (326, 68, 0)]] event:Finalized { hash: 0x8...f, tree_route: [] }  took:3.463522ms
                             ^   ^         ^     ^   ^  ^      ^   ^  ^        ^                                                   ^
unwatched txs in mempool ────┘   │         │     │   │  │      │   │  │        │                                                   │
   watched txs in mempool ───────┘         │     │   │  │      │   │  │        │                                                   │
                     views  ───────────────┘     │   │  │      │   │  │        │                                                   │
                      1st view block # ──────────┘   │  │      │   │  │        │                                                   │
                           number of ready tx ───────┘  │      │   │  │        │                                                   │
                                numer of future tx ─────┘      │   │  │        │                                                   │
                                        2nd view block # ──────┘   │  │        │                                                   │
                                      number of ready tx ──────────┘  │        │                                                   │
                                           number of future tx ───────┘        │                                                   │
                                                                 event ────────┘                                                   │
                                                                       duration  ──────────────────────────────────────────────────┘

It is logged after the maintenance is done.

The debug level enables per-transaction logging, allowing to keep track of all transaction-related actions that happened in txpool.

Integration notes

For teams having a custom node, the new txpool needs to be instantiated, typically in service.rs file, here is an example: https://github.com/paritytech/polkadot-sdk/blob/8be61eceac28a88e364ef1ada4be03aa13560692/cumulus/polkadot-parachain/polkadot-parachain-lib/src/common/spec.rs#L159-L168

To enable new transaction pool the following cli arg shall be specified: --pool-type=fork-aware. If it works, there shall be information printed in the log:

2024-09-20 21:28:17.528  INFO main txpool: [Parachain]  creating ForkAware txpool.

For debugging the following debugs shall be enabled:

      "-lbasic-authorship=debug",
      "-ltxpool=debug",

note: trace for txpool enables per-transaction logging.

Future work

The current implementation seems to be stable, however further improvements are required. Here is the umbrella issue for future work:

  • https://github.com/paritytech/polkadot-sdk/issues/5472

Partially fixes: #1202

michalkucharczyk avatar May 29 '24 21:05 michalkucharczyk

I can offer field testing service on Integritee-Paseo. We suffer from #1202 and need to restart one paseo collator authority every 5 min to get all extrinsics in which originate from a single wallet. https://github.com/integritee-network/worker/issues/1594

brenzi avatar Jun 12 '24 13:06 brenzi

I can offer field testing service on Integritee-Paseo. We suffer from #1202 and need to restart one paseo collator authority every 5 min to get all extrinsics in which originate from a single wallet. integritee-network/worker#1594

@brenzi: The code still needs polishing but it is ready to be tested in testnet. So if you can give it a try that would be greate, I can assist with debugging.

Following option shall be provided to enable new txpool: --pool-type=fork-aware For better understanding how new implementation behaves and help me with debugging please enable following logs:

      "-lbasic-authorship=debug",
      "-ltxpool=debug",
      "-lsync=debug",

michalkucharczyk avatar Jun 12 '24 13:06 michalkucharczyk

@michalkucharczyk just cherry-picking this PR would suffice to test it? I can also try to test it in our testnet

girazoki avatar Jun 19 '24 08:06 girazoki

@michalkucharczyk just cherry-picking this PR would suffice to test it? I can also try to test it in our testnet

Yes. But if you have custom node, you will also need to instantiate new txpool, here is example: https://github.com/paritytech/polkadot-sdk/blob/62b08b01f32e65649bc5dc7027dcf3f0404ead62/cumulus/polkadot-parachain/src/service.rs#L157-L164

michalkucharczyk avatar Jun 19 '24 09:06 michalkucharczyk

@michalkucharczyk just cherry-picking this PR would suffice to test it? I can also try to test it in our testnet

Yes. But if you have custom node, you will also need to instantiate new txpool, here is example:

https://github.com/paritytech/polkadot-sdk/blob/62b08b01f32e65649bc5dc7027dcf3f0404ead62/cumulus/polkadot-parachain/src/service.rs#L157-L164

thank you, we will try it

girazoki avatar Jun 19 '24 12:06 girazoki

thank you, we will try it

Please take a look here for cli args: https://github.com/paritytech/polkadot-sdk/pull/4639#issuecomment-2162999436

michalkucharczyk avatar Jun 21 '24 14:06 michalkucharczyk

@michalkucharczyk we have tested it and it seems to work well, one more request we want to add is, would it be possible to convert (with error) the generic txpool abstraction to one of the inner types (Basic/ForkAware?)? Something like as_basic_pool -> Option<BasicPool>

girazoki avatar Jul 01 '24 10:07 girazoki

Also we are quite satisfied about the results, what's the ETA On this getting reviewed?

girazoki avatar Jul 02 '24 11:07 girazoki

I'd be very happy if this PR gets merged till end of July.

michalkucharczyk avatar Jul 02 '24 18:07 michalkucharczyk

Across the codebase a following pattern is repeated over and over:

struct SomeData<P> {
  p: Arc<P>
}

fn use_some_data<P>(s: SomeData<P>) 
where
  P: TransactionPool
{ ... }

e.g. here: https://github.com/paritytech/polkadot-sdk/blob/d31bb8ac990286f4d4caa759ea960bb9866fc2b0/substrate/client/service/src/builder.rs#L784-L812 and https://github.com/paritytech/polkadot-sdk/blob/d31bb8ac990286f4d4caa759ea960bb9866fc2b0/substrate/client/service/src/builder.rs#L815-L841

Introducion of dynamic dispatching for transaction pool may require touching of all these statements, depending on choosen approach.

I tried three approaches with using transaction pool trait object.

(1) Relaxing Sized constraint in generic params.

Basically type used to refer to the transaction pool implementation is

dyn FullClientTransactionPool<Block, Client>

The generic parameter in

struct<T> { t:Arc<T> }

is by default constrained by Sized, to relax it explicit ?Sized bound needs to be added:

struct<T:?Sized> { t:Arc<T> }

This was done initially and rejected.

Please note that this approach is backward compatible and does not require to change type of field referencing transaction pool instance (Arc<P> type).

(2) Using wrapper

The type is used to reference the tranaction pool instance is:

pub struct TransactionPoolWrapper<Block, Client>(
	pub Box<dyn FullClientTransactionPool<Block, Client>>,
)

This approach does not require any changes in generic parameters or constrains used across code base.

(3) Using Arc<dyn FullClientTransactionPool>

The type used to reference the tranaction pool instance is:

Arc<dyn FullClientTransactionPool<Block, Client>>

As we don't want to have Arc<Arc<P>> objects, which is needless and maybe confusing, and also introduces unneccesary redirection, we need to change the type of the transaction pool in beforementioned structures:

struct SomeData<P> {
  p: Arc<P>
}

fn use_some_data<P>(s: SomeData<P>) 
where
  P: TransactionPool
{ ... }

nees to be converted into:

struct SomeData<P> {
  p: P
}

fn use_some_data<P>(s: SomeData<P>) 
where
  P: TransactionPool + 'static + Clone
{ ... }

Additionally, as dynamic upcast conversion is not yet suported in stable rust, we need to add methods that allows to do that. We need a method of converting Arc<dyn FullClientTransactionPool> into Arc<dyn LocalTransactionPool> or Arc<dyn TransactionPool>. This was done by adding LocalTransactioPool::as_local_transaction_pool_arc(&self:Arc<Self>) and TransactioPool::as_transaction_pool_arc(&self:Arc<Self>) to the appropriate traits.

(4) Getting rid of pool generic parameter

There is one more option: We could migrate from:

struct SomeData<P> {
  p: Arc<P>
}

to

struct SomeData<Block,Client> {
  p: Arc<dyn FullClientTransactionPool<Block,Client>>
}

but the problem is that TransactionPool related traits are either generic or have associated types. Which means we need to extend those structurs what make this a quite heavy refactor of many module. I think this work is definitely out of scope of implementation of fork-aware transaction pool.

(3) seems more elegant for me, but requires many changes (also rebasing becomes more annoying as many files are touched). (2) is lees elegant but also least invasive.

michalkucharczyk avatar Sep 19 '24 15:09 michalkucharczyk

IMO we should pick the most pragmatic solution that gets this merged. That seems to be (2) in my book. Wrapper might not be beautiful, but changes are minimal and the PR is quite big already, we should not add another refactoring on top. All the other proposed options are still possible later.

skunert avatar Sep 19 '24 15:09 skunert

Getting rid of pool generic parameter

I really don't really see the need for the generic parameter here. Just makes it more complicated and without the generic parameter the code should be easier. I would favor this.

But if you say 2 is faster for now, pick this one and then later do 4.

bkchr avatar Sep 19 '24 20:09 bkchr

But if you say 2 is faster for now, pick this one and then later do 4.

Filed here: https://github.com/paritytech/polkadot-sdk/issues/5489

michalkucharczyk avatar Sep 20 '24 19:09 michalkucharczyk

The CI pipeline was cancelled due to failure one of the required jobs. Job name: test-linux-stable 2/3 Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7484100

paritytech-cicd-pr avatar Oct 02 '24 14:10 paritytech-cicd-pr

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/the-fork-aware-transaction-pool/10468/1

Polkadot-Forum avatar Oct 15 '24 15:10 Polkadot-Forum