Do not unnecessarily retransmit `commitment_signed` in dual funding
On reconnection in the middle of the dual-funding flow, if both nodes have exchanged the initial commitment_signed and node A had sent its (initial) tx_signatures but node B never received them, both nodes should send a channel_reestablish with next_funding_txid set and a next_commitment_number of 1 (as they've already received the commitment transaction for commitment number 0).
The spec indicates in this case that both nodes should retransmit their commitment_signed, however, as this is only gated on next_funding_txid and not the next_commitment_number field. This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.
Instead, we should rely both the presence of next_funding_txid and next_commitment_number being zero to decide if we need to resend our commitment_signed. Sadly, we cannot rely on just next_commitment_number as that is used to request a force-closure in a non-standard way to work around implementations not honoring the error message.
- if `next_commitment_number` is equal to the commitment number of
the last `commitment_signed` message the receiving node has sent:
- MUST reuse the same commitment number for its next `commitment_signed`.
- otherwise:
- if `next_commitment_number` is not 1 greater than the
commitment number of the last `commitment_signed` message the receiving
node has sent:
- SHOULD send an `error` and fail the channel.
- if it has not sent `commitment_signed`, AND `next_commitment_number`
is not equal to 1:
- SHOULD send an `error` and fail the channel.
Core lightning currently follows these rules during channel_reestablish and will fail the channel on reestablish with a 0 value of next_commitment_number.
/* BOLT #2:
*
* - if `next_commitment_number` is equal to the commitment
* number of the last `commitment_signed` message the receiving node
* has sent:
* - MUST reuse the same commitment number for its next
* `commitment_signed`.
*/
if (next_commitment_number == peer->next_index[REMOTE] - 1) {
/* We completed opening, we don't re-transmit that one! */
if (next_commitment_number == 0)
peer_failed_err(peer->pps,
&peer->channel_id,
"bad reestablish commitment_number: %"
PRIu64,
next_commitment_number);
retransmit_commitment_signed = true;
/* BOLT #2:
*
* - otherwise:
* - if `next_commitment_number` is not 1 greater than the
* commitment number of the last `commitment_signed` message the
* receiving node has sent:
* - SHOULD send an `error` and fail the channel.
*/
} else if (next_commitment_number != peer->next_index[REMOTE])
peer_failed_err(peer->pps,
&peer->channel_id,
"bad reestablish commitment_number: %"PRIu64
" vs %"PRIu64,
next_commitment_number,
peer->next_index[REMOTE]);
else
retransmit_commitment_signed = false;
This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.
It's worth noting that the proposed change (setting next_commitment_number to zero) will fail channels for existing CLN deployments.
But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?
Technically it's still experimental for CLN, so we could change it. Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.
But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?
Yes exactly. And I just combed our node's logs, and it never happened that we were in this reconnection case with a non-phoenix node. So I really feel that we shouldn't bother with backwards-compat and that nobody will run into this issue in practice.
Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.
Agreed, using a feature bit here would be annoying...I think it's safe to update this without it and consider it an implementation issue?
Fixed the issue dusty pointed out:
$ git diff-tree -U5 90772ff d8cfa95
diff --git a/02-peer-protocol.md b/02-peer-protocol.md
index e9ba8bc..3e7ecb2 100644
--- a/02-peer-protocol.md
+++ b/02-peer-protocol.md
@@ -2480,16 +2480,12 @@ A node:
- MUST ignore any redundant `channel_ready` it receives.
- if `next_commitment_number` is equal to the commitment number of
the last `commitment_signed` message the receiving node has sent:
- MUST reuse the same commitment number for its next `commitment_signed`.
- otherwise:
- - if `next_commitment_number` is not 1 greater than the
- commitment number of the last `commitment_signed` message the receiving
- node has sent:
- - SHOULD send an `error` and fail the channel.
- - if it has not sent `commitment_signed`, AND `next_commitment_number`
- is not equal to 1:
+ - if `next_commitment_number` is not equal to the commitment number of the
+ next `commitment_signed` the receiving node will send:
- SHOULD send an `error` and fail the channel.
- if `next_revocation_number` is equal to the commitment number of
the last `revoke_and_ack` the receiving node sent, AND the receiving node
hasn't already received a `closing_signed`:
- MUST re-send the `revoke_and_ack`.
ACK d8cfa95
@ddustin @niftynei can you let me know when this is implemented in cln? I volunteer to do cross-compat tests so what we can get this merged ;)
@niftynei can we merge this PR? Or do you need to check things on the cln side?
Proposing to put on the freeze any BOLT changed authored by Matt Corallo as his behavior is not compatible with professional standards and until there is clarification of his attitude and his many abuses of the the code of conduct, starting by its establishment.
Superceded by https://github.com/lightning/bolts/pull/1289