bolts
bolts copied to clipboard
Channel Splicing (feature 62/63)
Splicing allows spending the current funding transaction to replace it with a new one that changes the capacity of the channel, allowing both peers to add or remove funds to/from their channel balance.
Splicing takes place while a channel is quiescent, to ensure that both peers have the same view of the current commitments.
We don't want channels to be unusable while waiting for transactions to confirm, so channel operation returns to normal once the splice transaction has been signed and we're waiting for it to confirm. The channel can then be used for payments, as long as those payments are valid for every pending splice transactions. Splice transactions can be RBF-ed to speed up confirmation.
Once one of the pending splice transactions confirms and reaches acceptable depth, peers exchange splice_locked to discard the other pending splice transactions and the previous funding transaction. The confirmed splice transaction becomes the channel funding transaction.
Nodes then advertise this spliced channel to the network, so that nodes keep routing payments through it without any downtime.
This PR replaces #863 which contains a lot of legacy mechanisms for early versions of splicing, which didn't work in some edge cases (detailed in the test vectors provided in this PR). It can be very helpful to read the protocol flows described in the test vector: they give a better intuition of how splicing works, and how it deals with message concurrency and disconnections.
This PR requires the quiescence feature (#869) to start negotiating a splice.
Credits to @rustyrussell and @ddustin will be added in the commit messages once we're ready to merge this PR.
Can I suggest we do this as an extension BOLT rather than layering it in with the existing BOLT2 text? It makes it easier to implement when all of the requirements deltas are in a single document than when it is inlined into the original spec. Otherwise, the PR/branch-diff itself is the only way to see the diff and that can get very messy during the review process as people's commentary comes in. While there are other ways to get at this diff without the commentary, it would make the UX of getting at this diff rather straightforward.
Given that the change is gated behind a feature bit anyway it also makes it easier for a new implementation to bootstrap itself without the splice feature by just reading the main BOLTs as is.
At some point in the future when splicing support becomes standard across the network we can consolidate the extension BOLT into the main BOLTs if people still prefer.
Why not, if others also feel that it would be better as an extension bolt. I prefer it directly in Bolt 2, because of the following reasons:
- Most of it is self contained in its own section(s) anyway.
- It's an important part of the channel lifecycle: channels are opened, then during normal operation payments are relayed and splices happen, then the channel eventually closes. It is nicely reflected in the architecture of the Bolt 2 sections right now.
- The few additions to existing message TLVs (
commit_sig,tx_add_input,tx_signatures) should not be in a separate document when merging, because otherwise different features may use the same TLV tags without realizing it, with a risk of inadvertently shipping incompatible code. I think it's important that all TLVs for a given message are listed in that message's section, this way you know you don't have to randomly search the BOLTs for another place where TLVs may be defined.
But if I'm the only one thinking this is better, I'll move it to a separate document!
One thing to note is that we already have two implementations (eclair and cln), and maybe a 3rd one (LDK) who are very close to code-complete and have had months of experience on mainnet, which means the spec is almost final and we should be able to to merge it to the BOLTs in the not-so-distant future (:crossed_fingers:).
One thing I've been thinking about is with large splices across many nodes, if some node fails to send signatures (likely because two nodes in the cluster demand to sign last) than splice will hang one tx_signatures.
I believe we need two things to address this:
- Timeout logic where splices are aborted
- Being lax about having sent our
tx_signaturesbut getting nothing back
Currently CLN fails the channel in this case as taking signatures and not responding is rather rude but this is bad because it could lead to clusters of splice channels being closed.
The unfortunate side effect of this is we have to be comfortable sending out signatures with no recourse for not getting any back.
I believe long term the solution is to maintain a signature-sending reputation for each peer and eventually blacklist peers from doing splices and / or fail your channels with that peer.
A reputation system may be beyond the needs of the spec but what to do with hanging tx_signatures (timeout etc) should be in the spec with a note about this problem.
- Timeout logic where splices are aborted
This is already covered at the quiescence level: quiescence will timeout if the splice doesn't complete (e.g. because we haven't received tx_signatures).
- Being lax about having sent our tx_signatures but getting nothing back
I don't think this is necessary, and I think we should really require people to send tx_signatures when it is owed, to ensure that we get to a clean state on both peers.
if some node fails to send signatures (likely because two nodes in the cluster demand to sign last)
It seems like we've discussed this many times already: this simply cannot happen because ordering based on contributed amount fixes this? Can you detail a concrete scenario where tx_signatures ordering leads to a deadlock?
I added a PR to fix the spec for short_channel_id post-splice: https://github.com/t-bast/bolts/pull/3
@t-bast asked me to put together a summary for Richard on how to implement the short_channel_id changes for splice_locked. The process is fairly straight forward but I'll write out some contextual information around it that might be helpful:
- When receiving
splice_locked, mark that it was received in a variable, and callcheck_splice_locked - When splice seen on chain at depth, send
splice_locked, mark that it was sent in a variable, save the txid of the locked transaction intosplice_locked_txid, and callcheck_splice_locked - When reconnecting without a pending splice but peer expects one (rules as in spec), resend
splice_locked - When
check_splice_lockedis called; If sent and receive variables are set, then: a) Clear send & receive variables b) Find the splice inflight that matchessplice_locked_txidc) Set the channel'sshort_channel_idaccording to the locked tx d) Update channel funding amounts according to the confirmed splice e) Save it all to disk and clear all splice inflights and reset channel allsplice state variables to neutral
@ddustin I've added more details for the announcement/gossip part in https://github.com/lightning/bolts/pull/1160/commits/adf968cb14e1363b99c347b0dfd215587a28e742 and finished implementing it in eclair. You can grab the last version of https://github.com/ACINQ/eclair/pull/2887 for your cross-compatibility tests which should contain everything!
Rebased to fix conflicts and squashed commits. Please carefully read the reconnection requirements, especially around handling of next_commitment_number: I have applied the same logic as #1214 to use next_commitment_number to indicate when we'd like commitment_signed to be retransmitted.
@ddustin the last commits add more information to channel_reestablish to help synchronize the splice_locked state and simplify retransmission: please review!
@ddustin the last commits add more information to
channel_reestablishto help synchronize thesplice_lockedstate and simplify retransmission: please review!
Will do! Excited to get this all "locked" in