loop Potential LND Memleak by loop out

trafficstars

Hey, we are experiencing disruptions in our service by some swap payments that increase cpu and mem exhausting resources and making LND and the machine that hosts it unusable.

This has happened many times but we have not reported it until we were more confident of the issue. Since this is coming from Loop, we decided to post the issue here.

Expected behavior

Swaps cannot make the system go down.

Actual behavior

We trigger a loop out via our custom liquidity software but we can correlate one swap to make LND unusable as it get DOS'd trying to pay a loop invoice through a faulty channel

For example, these two logs are printed heavily when LND starts to drain resources(the values are in range pattern [Min,Max]): [WRN] CRTR: Attempt [2015005-2017524] for payment 9ae66d63f07fc2de2a57ef255606052c**** failed: TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(*)({ Signature: (lnwire.Sig) { bytes: ([64]uint8) (len=64 cap=64) { 00000000 *| }, sigType: (lnwire.sigType) 0 }, ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, ShortChannelID: (lnwire.ShortChannelID) [613351-843624]:[123-5457]:[0-18], Timestamp: (uint32) [1690812878-1715946314], MessageFlags: (lnwire.ChanUpdateMsgFlags) 00000001, ChannelFlags: (lnwire.ChanUpdateChanFlags) [00000000-00000001], TimeLockDelta: (uint16) [30-144], HtlcMinimumMsat: (lnwire.MilliSatoshi) [0-1000000] mSAT, BaseFee: (uint32) [0-200000], FeeRate: (uint32) [0-5000], HtlcMaximumMsat: (lnwire.MilliSatoshi) [2970000000-198000000000] mSAT, ExtraOpaqueData: (lnwire.ExtraOpaqueData) { } }) )@[1-5]

And this one (the payment has been masked)

[INF] LOOP: 9ae66d Payment 9ae66d63f07fc2de2a57ef255606052c**** state=IN_FLIGHT, inflight_htlcs=[0-6], inflight_amt=[0-14138776000] mSAT During this period of the logs, this is the chart for the mem of the LND container, mem goes to 15gb in minutes.

CPU:

To reproduce

Invoke a loop out on a channel with temporary channel failures
LND CPU/MEM goes up, specially memory as it increases exponentially.

How we fix it

We disable swaps on the channel that we have detected the issues
Every calm downs.

System information

Versions: LND 0.17.5/0.17.4 and loop 0.27.1 for amd64 linux containers on kubernetes (AWS)

May 17 '24 14:05 Jossec101

Thanks for the report. The logs you posted indicate that it's just trying to pay the invoice which essentially an RPC call to LND. I wonder if you can reproduce this issue without loop too? Maybe just by trying to probe with an invoice to the destination that caused the spike?

May 17 '24 15:05 bhandras

Could you please also create a memory profile of the affected LND node? You can add the option: --profile=<profileport> and then use the usual Go profiling tools from the browser.

May 17 '24 15:05 bhandras

I did a loop out via CLI and memory started to increase exponentially, the bad part is that after enabling profiling the channel that was triggering the leak did swap succesfully and we can't reproduce for now. It was only happening on a specific chanset. I will report back when we have another issue like this.

May 20 '24 08:05 Jossec101

Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/

Maybe LND should be more aware, under a container of its memory limits

It seems to be working, I will close when our confidence this is solved is higher, but looks so :D

May 29 '24 11:05 Jossec101

Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/

Maybe LND should be more aware, under a container of its memory limits

It seems to be working, I will close when our confidence this is solved is higher, but looks so :D

Thanks for the update @Jossec101 🙏

May 29 '24 11:05 bhandras

Memory is being released properly, we can conclude it's a golang thing, I wonder if we should report this on main LND repo, i'm sure we are not the only ones suffering this.

Jun 02 '24 20:06 Jossec101

Yes seems like it is related to LND, so better to track through LND's issues.

Jun 03 '24 11:06 bhandras

loop loop copied to clipboard

Potential LND Memleak by loop out

loop
loop copied to clipboard