lnd icon indicating copy to clipboard operation
lnd copied to clipboard

LND has some very old inflight payments without associated HTLC

Open BhaagBoseDK opened this issue 2 years ago • 8 comments

Background

My lnd has many really old inflight payments (status = 1) which have no associated HTLC or invoices. These payments can not be deleted because they are inflight. And there is no chance these payments will ever succeed since they are so old.

Questions :

  1. Why does LND hang on to these very old payments?
  2. When and how these are removed, or progressed (marked failed 3 or unknown 0)

Here is an example of one such payment which is nearly a year old.

{
    "payments": [
        {
            "payment_hash": "378b9a32beacfbf7a07c03e9e0e98c0872c28e3a644a16185ca262ac95190cee",
            "value": "381220",
            "creation_date": "1628853635",
            "fee": "0",
            "payment_preimage": "0000000000000000000000000000000000000000000000000000000000000000",
            "value_sat": "381220",
            "value_msat": "381220000",
            "payment_request": "",
            "status": "IN_FLIGHT",
            "fee_sat": "0",
            "fee_msat": "0",
            "creation_time_ns": "1628853635111808994",
            "htlcs": [
                {
                    "attempt_id": "33997",
                    "status": "IN_FLIGHT",
                    "route": {
                        "total_time_lock": 695915,
                        "total_fees": "44",
                        "total_amt": "381264",
                        "hops": [
                            {
                                "chan_id": "764495932519350272",
                                "chan_capacity": "4000000",
                                "amt_to_forward": "381225",
                                "fee": "39",
                                "expiry": 695875,
                                "amt_to_forward_msat": "381225524",
                                "fee_msat": "39122",
                                "pub_key": "035b1ff29e8db1ba8f2a4f4f95db239b54069cb949b8cde329418e2a83da4f1b30",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "758977483516018688",
                                "chan_capacity": "381225",
                                "amt_to_forward": "381224",
                                "fee": "1",
                                "expiry": 695835,
                                "amt_to_forward_msat": "381224143",
                                "fee_msat": "1381",
                                "pub_key": "03037dc08e9ac63b82581f79b662a4d0ceca8a8ca162b1af3551595b8f2d97b70a",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "760774085596282881",
                                "chan_capacity": "381224",
                                "amt_to_forward": "381222",
                                "fee": "1",
                                "expiry": 695795,
                                "amt_to_forward_msat": "381222762",
                                "fee_msat": "1381",
                                "pub_key": "033d8656219478701227199cbd6f670335c8d408a92ae88b962c49d4dc0e83e025",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "763749364046626817",
                                "chan_capacity": "381222",
                                "amt_to_forward": "381221",
                                "fee": "1",
                                "expiry": 695755,
                                "amt_to_forward_msat": "381221381",
                                "fee_msat": "1381",
                                "pub_key": "0254b942f214919603f852ddba66610a0d2365a6a411f61ef79ed4bb9798468bfa",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "764633371377139712",
                                "chan_capacity": "381221",
                                "amt_to_forward": "381220",
                                "fee": "1",
                                "expiry": 695715,
                                "amt_to_forward_msat": "381220000",
                                "fee_msat": "1381",
                                "pub_key": "027ecdd3c509f7db2d8ade67381bb2e8ed88ccbfab8805d24076c4a0fd131f71ff",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "764458549095170049",
                                "chan_capacity": "2000000",
                                "amt_to_forward": "381220",
                                "fee": "0",
                                "expiry": 695715,
                                "amt_to_forward_msat": "381220000",
                                "fee_msat": "0",
                                "pub_key": "03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1",
                                "tlv_payload": false,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            }
                        ],
                        "total_fees_msat": "44646",
                        "total_amt_msat": "381264646"
                    },
                    "attempt_time_ns": "1628853635136165915",
                    "resolve_time_ns": "0",
                    "failure": null,
                    "preimage": null
                }
            ],
            "payment_index": "31889",
            "failure_reason": "FAILURE_REASON_NONE"
        }
    ],
    "first_index_offset": "31889",
    "last_index_offset": "31889",
    "total_num_payments": "0"
}

Your environment

  • version of lnd : "version": "0.15.0-beta commit=v0.15.0-beta"
  • which operating system (uname -a on *Nix) Linux umbrel 5.10.17-v8+ #1421 SMP PREEMPT Thu May 27 14:01:37 BST 2021 aarch64 GNU/Linux

Steps to reproduce

lncli listpayments --index_offset 0 --paginate_forwards --max_payments 1 --include_incomplete and continue until you find the first inflight payment.

Expected behaviour

LND should mark the payment failed when associated HTLC timeout.

Actual behaviour

LND has many very old inflight payments which are effectively orphan.

BhaagBoseDK avatar Aug 16 '22 15:08 BhaagBoseDK

Related https://github.com/lightningnetwork/lnd/issues/6249 ?

alexbosworth avatar Aug 16 '22 18:08 alexbosworth

I have one on my node. The payment is from two months ago. No HTLCs pending on the alleged outgoing channel involved, but lncli listpayments shows payment as being IN_FLIGHT.

{
    "payments": [
        {
            "payment_hash": "0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b",
            "value": "184527",
            "creation_date": "1654506732",
            "fee": "0",
            "payment_preimage": "0000000000000000000000000000000000000000000000000000000000000000",
            "value_sat": "184527",
            "value_msat": "184527000",
            "payment_request": "<redacted>",
            "status": "IN_FLIGHT",
            "fee_sat": "0",
            "fee_msat": "0",
            "creation_time_ns": "1654506732861871759",
            "htlcs": [
                {
                    "attempt_id": "69812",
                    "status": "IN_FLIGHT",
                    "route": {
                        "total_time_lock": 739649,
                        "total_fees": "36",
                        "total_amt": "184563",
                        "hops": [
                            {
                                "chan_id": "<redacted>",
                                "chan_capacity": "2000000",
                                "amt_to_forward": "184545",
                                "fee": "17",
                                "expiry": 739609,
                                "amt_to_forward_msat": "184545452",
                                "fee_msat": "17935",
                                "pub_key": "<redacted>",
                                "tlv_payload": true,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "<redacted>",
                                "chan_capacity": "10000000",
                                "amt_to_forward": "184527",
                                "fee": "18",
                                "expiry": 739569,
                                "amt_to_forward_msat": "184527000",
                                "fee_msat": "18452",
                                "pub_key": "<redacted>",
                                "tlv_payload": true,
                                "mpp_record": null,
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            },
                            {
                                "chan_id": "<redacted>",
                                "chan_capacity": "1000000",
                                "amt_to_forward": "184527",
                                "fee": "0",
                                "expiry": 739569,
                                "amt_to_forward_msat": "184527000",
                                "fee_msat": "0",
                                "pub_key": "<redacted>",
                                "tlv_payload": true,
                                "mpp_record": {
                                    "payment_addr": "c1b5f59db2671773ce3ab3c279c431315fb35ceaab9e138727f4ab0a2d56aa13",
                                    "total_amt_msat": "184527000"
                                },
                                "amp_record": null,
                                "custom_records": {
                                },
                                "metadata": null
                            }
                        ],
                        "total_fees_msat": "36387",
                        "total_amt_msat": "184563387"
                    },
                    "attempt_time_ns": "1654506733245416832",
                    "resolve_time_ns": "0",
                    "failure": null,
                    "preimage": null
                }
            ],
            "payment_index": "18627",
            "failure_reason": "FAILURE_REASON_NONE"
        }
    ],
    "first_index_offset": "18627",
    "last_index_offset": "18627",
    "total_num_payments": "0"
}

hazrulnizam avatar Aug 17 '22 03:08 hazrulnizam

The payments are in inflight because they have inflight HTLCs that need to be updated. There's a related fix in #6683. After which I will work on this "payment stuck due to HTLCs not being updated after timeout" issue.

yyforyongyu avatar Aug 17 '22 07:08 yyforyongyu

The payments are in inflight because they have inflight HTLCs that need to be updated. There's a related fix in #6683. After which I will work on this "payment stuck due to HTLCs not being updated after timeout" issue.

There does not appear to be any inflight HTLC (for the example I presented at least). The HTLC reported by lncli listpayments has expired a long while back but has still not been updated or removed from the payment.

The so called "IN_FLIGHT" is not reported as pending_htlc on lncli listchannels

BhaagBoseDK avatar Aug 17 '22 15:08 BhaagBoseDK

For me the issue disappeared for payments created after 24 May 2022, which corresponds to lnd v0.15.0-beta.rc3. It looks like older versions had the issue, and the corresponding older data about in-flight payments still needs to be cleaned up. As far as I understand the situation (there are multiple GitHub issues), @yyforyongyu is working on that cleanup mission?

C-Otto avatar Oct 23 '22 15:10 C-Otto

@yyforyongyu is working on that cleanup mission?

The plan is to merge #6683 first to understand why these payments are stuck and once the reason is known we will make another fix.

yyforyongyu avatar Oct 24 '22 06:10 yyforyongyu

I have a rather recent (created Sat Oct 15 2022 16:54:47 GMT+0000, i.e. almost 11 days ago) payment that's pending. All HTLCs known by lnd are failed (temporary channel failure). There's no non-failed HTLC. The timelock values are below the current block height. The only outgoing channel doesn't have any pending HTLC. When I invoke lncli trackpayment HASH I do not get any output until I press CTRL+C.

C-Otto avatar Oct 26 '22 08:10 C-Otto

Strangely, I also have this: 2022-10-15 18:56:49.138 [INF] IRPC: Canceled invoice HASH

C-Otto avatar Oct 26 '22 08:10 C-Otto

Could this be a cause? https://github.com/lightningnetwork/lnd/pull/6545

@yyforyongyu

C-Otto avatar Nov 16 '22 09:11 C-Otto

@C-Otto unlikely as that only changes how SendToRoute behaves? Will look into this issue once itest is fixed, should be next week.

yyforyongyu avatar Nov 17 '22 17:11 yyforyongyu

Alright. I think I can reproduce the issue by using lnd-manageJ's top-up feature, which sends out MPPs using the fix in #6545 (some of the shards failing, some payments not settling at all). Let me know if you need further information.

C-Otto avatar Nov 17 '22 21:11 C-Otto

In order for a payment to have the status "failed", there are two conditions. First, there must not be any HTLC that's still in flight (not failed/settled). Second, the payment's failure reason needs to be set. This second criterion is not met for one of the should-be-failed payments that bothers me.

The payment's failure reason is persisted (only) in payment_control.go:Fail (lines 505, 543). This is invoked from control_tower.go:Fail (lines 236, 242).

For this, there are several invocations. One of them is in router.go:sendToRoute (lines 2180, 2304), which is conditional on !skipTempErr.

Code not making use of skipTempErr is able to deal with failed payments, for example in payment_control.go:InitPayment (lines 111, 148): "We allow retrying failed payments.".

With skipTempErr we do not want to mark payments as failed if just one shard failed. However, we also don't tell lnd that we won't try another shard (in my case, that's application logic outside of lnd). As such, lnd needs to mark a payment as failed on its own, for example after some timeout passed.

C-Otto avatar Nov 21 '22 08:11 C-Otto

I pored over the logs for the singular case present on my node and here is what I can report:

The payment details is as per my previous post in this issue. payment_hash = 0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b

When the payment was first sent, it was added to the outgoing channel as HTLC 3533. There is a corresponding entry for queuing keystone ADD packet but the channel link failed before the usual removing ADD packet from mailbox:

2022-06-06 17:12:13.213 [DBG] CRTR: Payment 0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b in state terminate=false, active_shards=0, rem_value=184527000 mSAT, fee_limit=74000 mSAT
2022-06-06 17:12:13.213 [DBG] CRTR: PaymentSession(0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b): pathfinding for amt=184527000 mSAT
2022-06-06 17:12:13.214 [DBG] CRTR: Pathfinding absolute attempt cost: 284.527 sats
2022-06-06 17:12:13.243 [DBG] CRTR: Found route: probability=0.5281999999999998, hops=3, fee=36387 mSAT
2022-06-06 17:12:13.243 [DBG] CRTR: Pathfinding perf metrics: nodes=5, edges=2254, time=30.285311ms
2022-06-06 17:12:13.711 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=69812)->(Chan ID=<outgoing_short_chan_id>, HTLC ID=3533)
2022-06-06 17:12:13.711 [DBG] CRTR: Payment 0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b (pid=69812) successfully sent to switch, route: hop1 (184563387 mSAT) -> hop2 (184545452 mSAT) -> hop3 (184527000 mSAT), cltv 739649
2022-06-06 17:12:13.711 [DBG] CRTR: Payment 0181d246f8de9f69a8e9a151c4b841308f05364c1be01d0160d78b58d705727b in state terminate=false, active_shards=1, rem_value=0 mSAT, fee_limit=37613 mSAT
2022-06-06 17:12:13.713 [DBG] HSWC: Subscribing to result for paymentID=69812
2022-06-06 17:13:14.042 [ERR] HSWC: ChannelLink(<outgoing_channel_point>): failing link: unable to complete dance with error: remote unresponsive
2022-06-06 17:13:14.042 [INF] HSWC: ChannelLink(<outgoing_channel_point>): exited
2022-06-06 17:13:14.042 [INF] HSWC: Removing channel link with ChannelID(<outgoing_long_chan_id>)
2022-06-06 17:13:14.042 [INF] HSWC: ChannelLink(<outgoing_channel_point>): stopping
---
2022-06-06 17:16:32.108 [INF] PEER: disconnecting <outgoing_node_pubkey>@<outgoing_clearnet_address>:51509, reason: peer <outgoing_node_pubkey>@<outgoing_clearnet_address>:51509 no answer for 5m0s -- disconnecting
2022-06-06 17:16:32.108 [INF] PEER: unable to read message from <outgoing_node_pubkey>@<outgoing_clearnet_address>:51509: read tcp 192.168.0.11:9735-><outgoing_clearnet_address>:51509: use of closed network connection
2022-06-06 17:16:32.108 [INF] NTFN: Cancelling epoch notification, epoch_id=885
2022-06-06 17:16:32.308 [DBG] FNDG: Cancelling all reservations for peer <outgoing_node_pubkey>
2022-06-06 17:16:32.309 [DBG] FNDG: No active reservations for node: <outgoing_node_pubkey>
2022-06-06 17:16:32.309 [INF] DISC: Removing GossipSyncer for peer=<outgoing_node_pubkey>
2022-06-06 17:16:32.309 [DBG] PRNF: PeerNotifier notifying peer: <outgoing_node_pubkey> offline

When the channel link recovered 30 minutes (-ish) later, the channel initiated with start_htlc_id of 3533 and HTLC 3532 was failed.

2022-06-06 17:43:56.829 [INF] SRVR: New inbound connection from <outgoing_clearnet_address>:59885
2022-06-06 17:43:56.829 [DBG] BTCN: Canceling: <outgoing_node_pubkey>@<outgoing_clearnet_address>:9735 (reqid 800)
2022-06-06 17:43:56.829 [DBG] BTCN: Canceling: <outgoing_node_pubkey>@<outgoing_tor_address>:9735 (reqid 801)
2022-06-06 17:43:56.829 [INF] SRVR: Finalizing connection to <outgoing_node_pubkey>@<outgoing_clearnet_address>:59885, inbound=true
2022-06-06 17:43:56.830 [DBG] PRNF: PeerNotifier notifying peer: <outgoing_node_pubkey> online
2022-06-06 17:43:56.852 [DBG] LNWL: ChannelPoint(<outgoing_channel_point>): Restoring 0 dangling remote updates
2022-06-06 17:43:56.852 [DBG] LNWL: ChannelPoint(<outgoing_channel_point>): Restoring 0 local updates that the peer should sign
2022-06-06 17:43:56.852 [INF] PEER: NodeKey(<outgoing_node_pubkey>) loading ChannelPoint(<outgoing_channel_point>)
2022-06-06 17:43:56.852 [DBG] CNCT: New ChainEventSubscription(id=36) for ChannelPoint(<outgoing_channel_point>)
2022-06-06 17:43:56.852 [INF] HSWC: Removing channel link with ChannelID(<outgoing_long_chan_id>)
2022-06-06 17:43:56.853 [INF] HSWC: ChannelLink(<outgoing_channel_point>): starting
2022-06-06 17:43:56.853 [INF] HSWC: Trimming open circuits for chan_id=<outgoing_short_chan_id>, start_htlc_id=3533
2022-06-06 17:43:57.037 [INF] HSWC: Adding live link chan_id=<outgoing_long_chan_id>, short_chan_id=<outgoing_short_chan_id>
2022-06-06 17:43:57.037 [INF] NTFN: New block epoch subscription
2022-06-06 17:43:57.037 [INF] PEER: Negotiated chan series queries with <outgoing_node_pubkey>
2022-06-06 17:43:57.037 [INF] DISC: Creating new GossipSyncer for peer=<outgoing_node_pubkey>
2022-06-06 17:43:57.037 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(<outgoing_channel_point>)
2022-06-06 17:43:57.037 [INF] HSWC: ChannelLink(<outgoing_channel_point>): HTLC manager started, bandwidth=1089729235 mSAT
2022-06-06 17:43:57.038 [INF] HSWC: ChannelLink(<outgoing_channel_point>): attempting to re-synchronize
2022-06-06 17:43:57.258 [INF] HSWC: ChannelLink(<outgoing_channel_point>): received re-establishment message from remote side
2022-06-06 17:43:57.258 [DBG] LNWL: ChannelPoint(<outgoing_channel_point>): sync: remote's next commit height is 13690, while we believe it is 13690, we owe them a commitment
2022-06-06 17:43:57.259 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): removing Add packet (Chan ID=<some_other_short_chan_id_2>, HTLC ID=1826) from mailbox
2022-06-06 17:43:57.455 [INF] HSWC: ChannelLink(<outgoing_channel_point>): sending 2 updates to synchronize the state
2022-06-06 17:43:57.456 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): loaded 1 fwd pks
2022-06-06 17:43:57.456 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): removing completed fwd pkg for height=13689
2022-06-06 17:44:03.018 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): settle-fail-filter &{1 [0]}
2022-06-06 17:44:03.018 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): Failed to send 74798110 mSAT
2022-06-06 17:44:03.019 [DBG] HSWC: Closed completed FAIL circuit for 33e5564d53e8cc15e7ca1496354790073179e8ea66b623692522ac1ab025e5de: (<some_other_short_chan_id_2>, 1826) <-> (<outgoing_short_chan_id>, 3532)
2022-06-06 17:44:03.019 [DBG] HSWC: ChannelLink(<some_other_channel_point_2>): queueing removal of FAIL closed circuit: (Chan ID=<some_other_short_chan_id_2>, HTLC ID=1826)->(Chan ID=<outgoing_short_chan_id>, HTLC ID=3532)
2022-06-06 17:44:03.636 [DBG] HSWC: ChannelLink(<some_other_channel_point_2>): removing Fail/Settle packet (Chan ID=<some_other_short_chan_id_2>, HTLC ID=1826) from mailbox

The next payment I sent out via the channel then apparently re-used HTLC 3533:

2022-06-06 17:49:56.213 [DBG] CRTR: Payment 6a6f1e6254b9442e71ab2f66a5d67dbd23b896c6fed4a2a2ac946de072363412 in state terminate=false, active_shards=0, rem_value=184527000 mSAT, fee_limit=74000 mSAT
2022-06-06 17:49:56.213 [DBG] CRTR: PaymentSession(6a6f1e6254b9442e71ab2f66a5d67dbd23b896c6fed4a2a2ac946de072363412): pathfinding for amt=184527000 mSAT
2022-06-06 17:49:56.214 [DBG] CRTR: Pathfinding absolute attempt cost: 284.527 sats
2022-06-06 17:49:56.244 [DBG] CRTR: Found route: probability=0.5281999999999998, hops=3, fee=36387 mSAT
2022-06-06 17:49:56.244 [DBG] CRTR: Pathfinding perf metrics: nodes=5, edges=2166, time=30.454277ms
2022-06-06 17:49:57.559 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=69814)->(Chan ID=<outgoing_short_chan_id>, HTLC ID=3533)
2022-06-06 17:49:57.560 [DBG] CRTR: Payment 6a6f1e6254b9442e71ab2f66a5d67dbd23b896c6fed4a2a2ac946de072363412 (pid=69814) successfully sent to switch, route: hop1 (184563387 mSAT) -> hop2 (184545452 mSAT) -> hop3 (184527000 mSAT), cltv 739654
2022-06-06 17:49:57.560 [DBG] CRTR: Payment 6a6f1e6254b9442e71ab2f66a5d67dbd23b896c6fed4a2a2ac946de072363412 in state terminate=false, active_shards=1, rem_value=0 mSAT, fee_limit=37613 mSAT
2022-06-06 17:49:57.562 [DBG] HSWC: Subscribing to result for paymentID=69814
2022-06-06 17:49:58.187 [DBG] HSWC: ChannelLink(<outgoing_channel_point>): removing Add packet (Chan ID=0:0:0, HTLC ID=69814) from mailbox

Therefore, I am guessing the payment tracker is still waiting for the original HTLC 3533 to fail, but it was half-sent in the first place and was abandoned somehow by the channel link. So, the payment remains IN_FLIGHT till this day.

hazrulnizam avatar Nov 21 '22 09:11 hazrulnizam

could this also be causing bloating sphinxreplay.db ?

BhaagBoseDK avatar May 27 '23 07:05 BhaagBoseDK

Most of the data in sphinxreplay.db is just garbage, see #7107. With my fix I was able to reduce my sphinxreplay.db to 11 MByte.

C-Otto avatar May 27 '23 07:05 C-Otto

when merge to master?

BhaagBoseDK avatar May 27 '23 07:05 BhaagBoseDK