lnd icon indicating copy to clipboard operation
lnd copied to clipboard

[bug]: `CloseChannel`: `no_wait=True` still waiting

Open ZZiigguurraatt opened this issue 7 months ago • 7 comments

I try to use the CloseChannel no_wait option. If I set no_wait=True, it still waits for the transaction to be confirmed before stopping the gRPC streaming call. It's possible that's not the intent, but the documentation here is a bit confusing what sentence refers to no_wait=True and what refers to no_wait=False. Possibly we need another option that controls whether or not the call waits for the close to be confirmed?

https://github.com/lightningnetwork/lnd/blob/3707b1fb7032d98dc2a8e881d3ae83e61fbc35e3/lnrpc/lightning.proto#L2159-L2167

ZZiigguurraatt avatar May 19 '25 22:05 ZZiigguurraatt

This is about waiting for the channel to clear in the case of active HTLCs.

If that flag is false, and you have a pending HTLC on the channel you're trying to close, it'll error out.

If it's true, and you have active HTLCs, it'll wait for those to clear and then try once that's the case.

Setting it to true is better for UX, as you can just fire and forget, and lnd will coop close when it's able to.

Roasbeef avatar May 19 '25 22:05 Roasbeef

Possibly we need another option that controls whether or not the call waits for the close to be confirmed?

The call returns streaming RPCs that tell you when we broadcast initially, and then later once things are confirmed.

Roasbeef avatar May 19 '25 23:05 Roasbeef

Possibly we need another option that controls whether or not the call waits for the close to be confirmed?

The call returns streaming RPCs that tell you when we broadcast initially, and then later once things are confirmed.

Yeah, I'm wondering if there should be an option to hangup after broadcast, but before confirming. Or, maybe the client should just figure out how to do that themselves?

ZZiigguurraatt avatar May 19 '25 23:05 ZZiigguurraatt

If that flag is false, and you have a pending HTLC on the channel you're trying to close, it'll error out.

I think this is the main thing that is missing from the docs, that it will error out. Right now, I'm confused because if it is already going to block to wait for confirmation, why do I need an extra option to block for in flight HTLC?

If it's true, and you have active HTLCs, it'll wait for those to clear and then try once that's the case.

no_wait=True to me means wait=False, so why would it wait for HTLC to clear in that case?

Setting it to true is better for UX, as you can just fire and forget, and lnd will coop close when it's able to.

What is the default?

ZZiigguurraatt avatar May 21 '25 14:05 ZZiigguurraatt

Or, maybe the client should just figure out how to do that themselves?

As is, the client can do that themselves.

What is the default?

The default is the original behavior: if you have active HTLCs, and try to coop close, it'll error out.

I agree documentation can be improved here. Feel free to make a PR with a suggestion.

We can either rename the issue title, or close this as it's about waiting for HTLCs to clear, not waiting to broadcast or not.

Roasbeef avatar May 22 '25 01:05 Roasbeef

Was this feature necessary as part of co-op close or was it just added at the same time as a convenience/coincidence? Like without co-op close it could still be useful, right?

ZZiigguurraatt avatar May 22 '25 17:05 ZZiigguurraatt

Was this feature necessary as part of co-op close or was it just added at the same time as a convenience/coincidence? Like without co-op close it could still be useful, right?

This was added before the RBF coop close stuff. This was intended to make the API easier to use. As before the client had to retry in the background until a channel was clear to coop close. This could be difficult for a high volume backbone channel. The new API allows the client to submit the RPC call, then just wait until lnd finds an opportunity to truly shutdown the channel.

Roasbeef avatar May 22 '25 21:05 Roasbeef

The default is the original behavior: if you have active HTLCs, and try to coop close, it'll error out.

So, to me this sounds like no_wait=True which means wait=False would be the default behavior. However, this conflicts with your statement

If it's true, and you have active HTLCs, it'll wait for those to clear and then try once that's the case.

ZZiigguurraatt avatar Jun 17 '25 16:06 ZZiigguurraatt

Also, could

Moreover if a coop close is specified and this flag is set to true the coop closing flow will be initiated even if HTLCs are active on the channel. The channel will wait until all HTLCs are resolved and then start the coop closing process. The channel will be disabled in the meantime and will disallow any new HTLCs.

be simplified to just say

"if a coop close is specified, this option is ignored"

?

The reason I say this is because if no_wait=True means wait=False and if "his flag is set to true the coop closing flow will be initiated even if HTLCs are active on the channel. The channel will wait until all HTLCs are resolved and then start the coop closing process. The channel will be disabled in the meantime and will disallow any new HTLCs", it reads to me like the option is just ignored.

ZZiigguurraatt avatar Jun 17 '25 17:06 ZZiigguurraatt

I feel like some of these confusions were introduced in https://github.com/lightningnetwork/lnd/commit/59443faa36121384448add127dfae768777aa93e

Like this also seems backwards to me: https://github.com/lightningnetwork/lnd/blob/c1740c14baa3ac6e33e382c2d93b1c2bfa775a0d/cmd/commands/commands.go#L1135-L1137

Does NoWait: true not mean Wait: false?

Also,

https://github.com/lightningnetwork/lnd/blob/c1740c14baa3ac6e33e382c2d93b1c2bfa775a0d/cmd/commands/commands.go#L1057-L1062

seems to be mostly unrelated to the no_wait option? If so, I think we need to better clarify this a bit.

@ziggie1984 , do you have any input on these confusions?

ZZiigguurraatt avatar Jun 17 '25 18:06 ZZiigguurraatt

I've done some testing and if I try to close a channel with an in flight HTLC without setting no_wait, I get the error

cannot coop close channel with active htlcs (number of active htlcs: 1), bypass this check and initiate the coop close by setting no_wait=true

so, this leads me to believe that the default is no_wait=false.

Also, it seems to me like

    // If true, then the rpc call will not block while it awaits a closing txid
    // to be broadcasted to the mempool.

should be changed to

    // If false, then the rpc call will error out if there are in flight HTLC and
    // the channel can't be cooperatively closed right now.
    // If true, then the rpc call will block while it awaits all in flight HTLC to
    // to be settled and a closing txid is to be broadcasted to the mempool.

?

But I'm really confused why this parameter is called no_wait because it seems wait is what it is really doing.

ZZiigguurraatt avatar Jun 17 '25 21:06 ZZiigguurraatt

Possibly

    // If true, then the rpc call will not block while it awaits a closing txid
    // to be broadcasted to the mempool.

refers to force closing a channel only and that is where my confusion lies?

ZZiigguurraatt avatar Jun 17 '25 21:06 ZZiigguurraatt

You are right it is definitely confusing but the no_wait specifies more than just waiting for the acitive_htlcs, because the closechannel rpc accpets a Stream the no_wait does 2 things:

So the stream returns 3 possible messages:

//	*CloseStatusUpdate_ClosePending
	//	*CloseStatusUpdate_ChanClose
	//	*CloseStatusUpdate_CloseInstant

So there are 3 steps a closing can be in, Closing Tx confirmed, Closing Tx broadcasted, or Initiated(CloseStatusUpdate_CloseInstant). When specifying no_wait you only are interested in the initiated response. You basically don't want to wait for LND to broadcast the tx. If acitve HTLCs are on the channel, it is basically not possible to not wait because LND cannot initate the CoopClose in the first place. Does this make sense ?

So the no_wait relates more to the Stream Response we expect as a caller not waiting for the HTLCs to resolve.

But agree it is a bit tricky to understand, but I did not want to change the RPC interface just for that back when I did the change. Lmk how we can improve the doc. so it becomes bulletproof.

ziggie1984 avatar Jun 18 '25 07:06 ziggie1984

not possible to not wait

to me means "possible to wait", I think this could be where the confusion lies.

ZZiigguurraatt avatar Jun 18 '25 15:06 ZZiigguurraatt

When specifying no_wait you only are interested in the initiated response. You basically don't want to wait for LND to broadcast the tx.

As mentioned above, when I specify no_wait=True, then it blocks and waits. As mentioned above, to me, no_wait=True means wait=False, so that is where it does not make any sense the behavior that I'm seeing.

ZZiigguurraatt avatar Jun 18 '25 15:06 ZZiigguurraatt

Some discussions have continued offline and https://github.com/lightningnetwork/lnd/pull/9958 has been updated in consideration of that.

ZZiigguurraatt avatar Jun 19 '25 15:06 ZZiigguurraatt