lnd
lnd copied to clipboard
mutil: introduce simple `BlockBeat` to fix itest flakes
In an attempt to fix the itest flakes found in the sweeper/multi-hop tests, I realized there's no easy way to get around the block sync issue. In fact, I think these flakes already mean block sync may cause issues outside of the test context. Thus, a minimal version of BlockBeat
is implemented.
Atm, when a new block is received, it's sent to the subsystems concurrently,
flowchart
block((block))
block ---> c[ChainArb]
block ---> ca1[ChannelArb1]
block ---> ca2[ChannelArb2]
block ---> us[UtxoSweeper]
block ---> b[Bumper]
block ---> ar[AnchorResolver]
block ---> cr[CommitResolver]
block ---> hr[HtlcResolver...]
One system may process the block faster than another, causing a block sync issue. Blockbeat
will make sure the blocks are sent in the following order,
flowchart
block((block))
block --> c[ChainArb]
c --> ca1[ChannelArb1]
c --> ca2[ChannelArb2]
c --> ca3[ChannelArb...]
ca1 ---> ar[AnchorResolver]
ca1 ---> cr[CommitResolver]
ca1 ---> hr[HtlcResolver...]
ca2 ---> rs2[Resolvers...]
ca3 ---> rs3[Resolvers...]
ar ---> us[UtxoSweeper]
cr ---> us[UtxoSweeper]
hr ---> us[UtxoSweeper]
rs2 ---> us[UtxoSweeper]
rs3 ---> us[UtxoSweeper]
us ---> b[Bumper]
which means the following new behavior in the sweeper:
- no more waiting for the next block to trigger the sweep - when an input is sent to the sweeper at height X, the sweeping won't happen until block X+1, which is no longer the case.
- no need to immediately sweep inputs during restart - this hack can be removed now.
- no more different views of block heights among these subsystems - the fee func can now accurately calculate its width.
TODOs:
- [ ] fix unit tests
- [ ] add unit tests
- [ ] fix itests -> new pr to limit the scope
[!IMPORTANT]
Review skipped
Auto reviews are limited to specific labels.
Labels to auto review (1)
- llm-review
Please check the settings in the CodeRabbit UI or the
.coderabbit.yaml
file in this repository. To trigger a single review, invoke the@coderabbitai review
command.You can disable this status message by setting the
reviews.review_status
tofalse
in the CodeRabbit configuration file.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-
I pushed a fix in commit <commit_id>.
-
Generate unit testing code for this file.
-
Open a follow-up GitHub issue for this discussion.
-
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitai
in a new review comment at the desired location with your query. Examples:-
@coderabbitai generate unit testing code for this file.
-
@coderabbitai modularize this function.
-
- PR comments: Tag
@coderabbitai
in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:-
@coderabbitai generate interesting stats about this repository and render them as a table.
-
@coderabbitai show all the console.log statements in this repository.
-
@coderabbitai read src/utils.ts and generate unit testing code.
-
@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-
@coderabbitai help me debug CodeRabbit configuration file.
-
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (invoked as PR comments)
-
@coderabbitai pause
to pause the reviews on a PR. -
@coderabbitai resume
to resume the paused reviews. -
@coderabbitai review
to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. -
@coderabbitai full review
to do a full review from scratch and review all the files again. -
@coderabbitai summary
to regenerate the summary of the PR. -
@coderabbitai resolve
resolve all the CodeRabbit review comments. -
@coderabbitai configuration
to show the current CodeRabbit configuration for the repository. -
@coderabbitai help
to get help.
Additionally, you can add @coderabbitai ignore
anywhere in the PR description to prevent this PR from being reviewed.
CodeRabbit Configration File (.coderabbit.yaml
)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yaml
file to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
So the basic idea is to synchronize the entire contractcourt and sweeper on each block?
Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.
Is it guaranteed that a BlockBeat
fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.
So the basic idea is to synchronize the entire contractcourt and sweeper on each block?
Yes.
Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.
It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the BlockBeat
.
Is it guaranteed that a
BlockBeat
fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.
Yes, when a new subscription is made via RegisterBlockEpochNtfn
, the current block should always be sent.
Concept ACK.
It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the
BlockBeat
.
Yeah, the diagram is a bit confusing. From the code it looks like ChainArbitrator
passes block info down to channel arbitrators, which pass it to resolvers. But the sweeper gets the block info directly from BlockBeat
, right?
So basically BlockBeat
(1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?
flowchart LR
block([new block]) --> bb[BlockBeat]
bb --1--> c[ChainArbitrator]
bb --2--> us[UtxoSweeper]
c ~~~ us
subgraph cc[contractcourt]
c --> ca1[ChannelArb1]
c --> ca2[ChannelArb2]
c --> ca3[ChannelArb...]
ca1 ---> ar[AnchorResolver]
ca1 ---> cr[CommitResolver]
ca1 ---> hr[HtlcResolver...]
ca2 ---> rs2[Resolvers...]
ca3 ---> rs3[Resolvers...]
end
subgraph s[sweep]
us ---> b[Bumper]
end
But the sweeper gets the block info directly from BlockBeat, right?
Yeah exactly.
So basically BlockBeat (1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?
Close enough! I'd like to view Bumper
an independent system as it can be moved out of the sweep
package. Another important new behavior is the subsystem must ACK a done signal back after processing the block - this would defintely require more attention when designing/refactoring subsystems, as it forces a designer to define what does it mean when we say the subsystem has finished processing a block. Maybe it's more clear using a sequence graph,
sequenceDiagram
autonumber
participant bb as BlockBeat
participant cc as ChainArb
participant us as UtxoSweeper
participant tp as TxPublisher
note left of bb: 0. received block x,<br>dispatching...
note over bb,cc: 1. send block x to ChainArb,<br>wait for its done signal
bb->>cc: block x
rect rgba(165, 0, 85, 0.8)
critical signal processed
cc->>bb: processed block
option Process error or timeout
bb->>bb: error and exit
end
end
note over bb,us: 2. send block x to UtxoSweeper, wait for its done signal
bb->>us: block x
rect rgba(165, 0, 85, 0.8)
critical signal processed
us->>bb: processed block
option Process error or timeout
bb->>bb: error and exit
end
end
note over bb,tp: 3. send block x to TxPublisher, wait for its done signal
bb->>tp: block x
rect rgba(165, 0, 85, 0.8)
critical signal processed
tp->>bb: processed block
option Process error or timeout
bb->>bb: error and exit
end
end
Cool. I think this could be used to implement https://github.com/lightningnetwork/lnd/issues/8166. We would just need to make sure that the initial block is distributed by BlockBeat
before the peer manager starts.
Does TxPublisher
need to process the block before UtxoSweeper
?
Suppose a conflicting transaction confirms at block X
(e.g., https://github.com/lightningnetwork/lnd/issues/8737). If UtxoSweeper
processes block X
first, it won't know about any inputs that need to be rebatched due to the conflicting transaction. So publishing a rebatched transaction won't happen until block X+1
confirms.
But if TxPublisher
processes block X
first, it will fail the conflicted inputs, thereby allowing UtxoSweeper
to publish a rebatched transaction immediately.
Good question @morehouse. I need to think about it more, whether UtxoSweeper -> TxPublisher
, or the other around, or in parallel. The issue with TxPublisher -> UtxoSweeper
means every tx will be broadcast at X+1. Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?
reconstruct the input set from the mempool tx, which is the missing part that gives the sweeper the ability to recover its state from a restart.
The issue with
TxPublisher -> UtxoSweeper
means every tx will be broadcast at X+1.
I don't think so? TxPublisher.Broadcast
will still do the initial broadcast immediately.
Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?
Conflicting transactions aren't only created by us. The counterparty can also spend HTLCs, causing conflicts.