substrate
substrate copied to clipboard
Instant-seal finalization event not always firing
I am using the setup from #10239, but not using docker. That is, we have instant-seal with instant finalization setup. However, about 50% of the times, no finalization event gets received by client. The finalization itself is happening, I can see in the output that the latest finalized block is equal to the latest block, so it seems to be only the event that is missing.
I looked some more into this. It turns out that the author_extrinsicUpdate always is fired with "result":"ready" and "result":{"inBlock, but "result":{"finalized" is fired only sometimes. I ruled out a client-side issue by testing with both SubXt and PolkadotJs, and I verified that the message is not sent by using tcpdump.
Steps to reproduce:
- Build any runtime with instant-seal, with
finalize: true - run tcpdump, e.g.
tcpdump -l -i lo -A port 9944 | grep author_extrinsicUpdate - dispatch any extrinsic and listen for the result
- observe that
{"jsonrpc":"2.0","method":"author_extrinsicUpdate","params":{"result":{"finalized"only sometimes shows up
I have done some re-testing and found that this is still not fixed in 0.9.24.
I believe that the root cause of this issue is one of notification ordering. The inBlock jsonrpc event gets sent in the pruned function, while the finalized events get sent in the finalized function. However, sometimes finalized gets called before pruned (I have observed this in the tracing output). When this happens, finalized becomes a no-op, since no items have been added yet to finality_watchers - this needs to be done in pruned beforehand.
As for the reason why finalized gets called before pruned, I think these events get queued here. I believe part of the reason is that the finalized notification is done before the imported notification. However, swapping these would not completely fix the problem. The notifications get sent through two different channels. The notifications get listened to here in notification_future, where the two streams are merged through the select function. I think that this is the reason why the finalized events only sometimes gets sent: the behavior of select is likely not deterministic: Depending on the timing of the polls, either the finalized or the imported notification could be handled first, and if the finalized gets processed before inblock, no finalization event gets sent over jsonrpc.
As for the solution to this problem, it seems to me that an ordering of the processing of these notifications needs to be enforced. Rather than using two separate channels, a single channel could be used to send a new enum Notification { Imported(...), Finalized(...) }. Then notification_future could use that channel and be sure that the ordering is as intended. Other notification listeners could then just filter_map on this stream to get only the type of events they are interested in.
I just looked through the code and it is correct what you are saying. However, your proposed solution is not correct. I think we need to fix the following code: https://github.com/paritytech/substrate/blob/c172d0f683fab3792b90d876fd6ca27056af9fe9/client/transaction-pool/src/lib.rs#L713-L723
The code needs to be written in a way that when we first receive a finalized event, we first enact the blocks that are finalized. One tricky thing is that if the import block event still comes later and there was a re-org, we would need to ensure that we reinsert the transactions of the old fork. However, all of that is already supported, we just need to wire it differently.
Thank you @bkchr . I tried to implement your suggestion but unfortunately, being unfamiliar with the codebase, I did not succeed. I wanted to try to call maintain(ChainEvent::NewBestBlock{...}) in the ChainEvent::Finalized branch. However, the tree_route arguments need to be different for both calls and I don't know how to construct it. Furthermore, I'm not sure if the approach would have worked even if I had succeeded to construct the call..
Is there any chance someone from the Parity team can try to implement the approach outlined by @bkchr above?
Is there any chance someone from the Parity team can try to implement the approach outlined by @bkchr above?
I will assign this to someone :)