bolts icon indicating copy to clipboard operation
bolts copied to clipboard

`option_simple_close` (features 60/61)

Open t-bast opened this issue 1 year ago • 1 comments

This PR is a continuation of #1096, that @rustyrussell asked me to take over. The original description was:

This is a "can't fail!" close protocol, as discussed at the NY Summit, and on @Roasbeef's wishlist.  It's about as simple as I could make it: the only complexity comes from allowing each side to indicate whether they want to omit their own output.

It's "taproot ready"(TM) in the sense that `shutdown` is always sent to trigger it, so that can contain the nonces without any persistence requirement.

I split it into three commits for cleanliness:

1. Introduce the new protocol
2. Remove the requirement that shutdown not be sent multiple times (which was already nonsensical)
3. Remove the older protocols

I recommend reviewing it as separate commits, it'll make more sense!

I believe it is still useful to review as separate commits: however, we initially allowed setting nSequence, which we removed in favor of setting nLockTime. That part can probably be skipped. I squashed the fixup commits from the previous PR, but kept the rest.

t-bast avatar Oct 11 '24 08:10 t-bast

As described in https://github.com/lightning/bolts/pull/1096#issuecomment-2406457135, the main question is whether we want to have stricter requirements on exchanging shutdown whenever one side sends a new one. This is probably required to ensure that we can correctly exchange nonces to produce partial signatures for taproot channels: we want to make sure we get this right, as the goal of this protocol is to be compatible with taproot channels!

@Roasbeef let me know what you think: I'm currently leaning towards your initial implementation where you must receive shutdown after sending one. If we decide on that, I'll clarify the spec!

t-bast avatar Oct 11 '24 08:10 t-bast

I added the requirement to strictly exchange shutdown before sending closing_complete again in https://github.com/lightning/bolts/pull/1205/commits/a8fd1ab74255af03f4a86a38890d54ee86b1dd4d

This is implemented in https://github.com/ACINQ/eclair/pull/2747, waiting for lnd for cross-compatibility tests (may need to update the feature bit either on the lnd side to use 60 or on the eclair side to use 160)!

t-bast avatar Oct 22 '24 01:10 t-bast

As pointed out by @TheBlueMatt during yesterday's spec meeting, we can greatly simplify the protocol by removing the shutdown exchange entirely. The only piece of data nodes must remember is the last script their peer sent. This can be found in the last received closing_complete, or in shutdown if closing_complete was never received. This doesn't even need to be persisted, because on reconnection nodes will exchange shutdown again with the last script they want to use for their output.

By doing that, the protocol becomes a trivial request/response protocol where nodes send closing_complete and expect closing_sig back. This creates a race condition when both nodes update their script at the same time, but this will be extremely rare so we can simply resolve it by reconnecting.

I've made those changes in https://github.com/lightning/bolts/pull/1205/commits/aabde330b35bfe4e01f3a31cbb14c297bf8b0edd and implemented them in https://github.com/ACINQ/eclair/pull/2967 and it is indeed much simpler.

It should be quite simple to update an existing implementation of the previous version of the protocol to this version. I hope this will get rid of the unclear parts of the previous versions and be easier for reviewers and implementers!


This protocol is compatible with taproot channels, with the following additions:

  • when sending shutdown, nodes will include two random nonces:
    • closer_nonce that will be used in their closing_complete
    • closee_nonce that will be used in their closing_sig
  • when sending closing_complete, nodes will include a new random nonce for their next closing_complete (next_closer_nonce)
  • when sending closing_sig, nodes will include a new random nonce for their next closing_sig (next_closee_nonce)

This ensures that nodes always have a pair of random nonces for their next signing round.


EDIT (July 2025): we don't need the closer nonces, they should be directly transmitted with the partial sigs (whenever we send a partial sig, we send partial sig + the local nonce used), so this should become:

  • when sending shutdown, nodes include a random nonce (closee_nonce) used for the remote closing transaction, which they will use when sending closing_sig
  • when sending closing_complete, nodes use:
    • the public part of the remote closee_nonce
    • a new random local nonce
    • they send the partial sig + the public part of the random local nonce
  • when sending closing_sig:
    • the partial sig is created using the local closee_nonce and the remote nonce included with the remote partial sig
    • a next_closee_nonce is included to allow the other peer to RBF this transaction, which replaces the previous closee_nonce (obtained either from shutdown or from the previous closing_sig)

t-bast avatar Dec 17 '24 17:12 t-bast

This doesn't even need to be persisted

When was shutdown ever persisted in the first place?

The only piece of data nodes must remember is the last script their peer sent

Not sure about yours, but my implementation is stateless a is between shutdown iterations. What data were you remembering between iterations?

As pointed out by @TheBlueMatt during yesterday's spec meeting, we can greatly simplify the protocol by removing the shutdown exchange entirely. The only piece of data nodes must remember is the last script their peer sent

Can you recount what the supposed issue was with the existing flow? Looked at the diff, and it looks to add more fields to the closing messages vs keeping them single purpose as they were before.

Looking at the new ascii diagram, it looks like the shutdown message is still part of the flow, and this is an optimization where you echo back the other party's shutdown message which is now embedded in the new closing messages?

Roasbeef avatar Dec 17 '24 18:12 Roasbeef

When was shutdown ever persisted in the first place?

Good point, we were storing the shutdown message but that was unnecessary. It's still unnecessary with the latest changes, so we're all good!

Not sure about yours, but my implementation is stateless a is between shutdown iterations. What data were you remembering between iterations?

It's not stateless because you had to remember in which state of your state machine you were (e.g. whether you've already sent and received shutdown or not, and whether you've sent closing_complete or not). Note that by remembering I don't mean persist, but remember in-memory: you had at least 3 different states of your FSM for the previous protocol.

The point of the last commit is that you don't need a state machine at all: everything happens in a single state, where you simply remember the last script your peer wants to use (and update it whenever you receive a closing_complete from them). For taproot it will require some state for the nonces, but that's it!

Can you recount what the supposed issue was with the existing flow? Looked at the diff, and it looks to add more fields to the closing messages vs keeping them single purpose as they were before.

I added more fields to the closing messages but they don't require state and are trivial to include, this isn't adding any complexity: closing_sig is just echoing what closing_complete requested, which makes everything explicit and simplifies debugging. closing_complete now includes the scripts to also make all the details of the transaction explicit (which resolves the issue of signatures not matching because scripts didn't match), and allows changing the local script without requiring a strict exchange of shutdown messages, which added an unnecessary step in the state machine.

Looking at the new ascii diagram, it looks like the shutdown message is still part of the flow, and this is an optimization where you echo back the other party's shutdown message which is now embedded in the new closing messages?

Not at all: you still of course need shutdown to initiate closing (to tell your peer you want to close an active channel) and on reconnection, but after that you're not allowed to re-send shutdown. Whenever you want to sign a new version of your closing transaction, you just send closing_complete. This is what removes rounds in the protocol and gets rid of the state machine.

t-bast avatar Dec 18 '24 09:12 t-bast

Whenever you want to sign a new version of your closing transaction, you just send closing_complete. This is what removes rounds in the protocol and gets rid of the state machine.

Gotcha, this makes sense. If it's still a state machine or not is somewhat subjective and an implementation detail, but I think what's concrete here is that it no longer needs to loop back upon itself to re-enter the shutdown phase. As the responder you can just send the rely, but as the initiator you still shift between sending the reply and waiting for it to complete.

Will work on updating my implementation so we can finally get this thing merged!

Roasbeef avatar Dec 18 '24 10:12 Roasbeef

Rebased to fix trivial formatting conflict in Bolt 9. We have successfully done cross-compat tests between eclair and lnd, so this should be ready to merge! :tada: :fire:

@tnull can you take a look at the PR before we merge?

@yyforyongyu can you take a look at the responses to your last comments?

t-bast avatar Feb 07 '25 10:02 t-bast

Merging this PR as agreed during the last spec meeting, congrats everyone!

t-bast avatar Feb 12 '25 08:02 t-bast