loop icon indicating copy to clipboard operation
loop copied to clipboard

Potential LND Memleak by loop out

Open Jossec101 opened this issue 1 year ago • 3 comments
trafficstars

Hey, we are experiencing disruptions in our service by some swap payments that increase cpu and mem exhausting resources and making LND and the machine that hosts it unusable.

This has happened many times but we have not reported it until we were more confident of the issue. Since this is coming from Loop, we decided to post the issue here.

Expected behavior

Swaps cannot make the system go down.

Actual behavior

We trigger a loop out via our custom liquidity software but we can correlate one swap to make LND unusable as it get DOS'd trying to pay a loop invoice through a faulty channel

For example, these two logs are printed heavily when LND starts to drain resources(the values are in range pattern [Min,Max]): [WRN] CRTR: Attempt [2015005-2017524] for payment 9ae66d63f07fc2de2a57ef255606052c**** failed: TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(*)({ Signature: (lnwire.Sig) { bytes: ([64]uint8) (len=64 cap=64) { 00000000 *| }, sigType: (lnwire.sigType) 0 }, ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, ShortChannelID: (lnwire.ShortChannelID) [613351-843624]:[123-5457]:[0-18], Timestamp: (uint32) [1690812878-1715946314], MessageFlags: (lnwire.ChanUpdateMsgFlags) 00000001, ChannelFlags: (lnwire.ChanUpdateChanFlags) [00000000-00000001], TimeLockDelta: (uint16) [30-144], HtlcMinimumMsat: (lnwire.MilliSatoshi) [0-1000000] mSAT, BaseFee: (uint32) [0-200000], FeeRate: (uint32) [0-5000], HtlcMaximumMsat: (lnwire.MilliSatoshi) [2970000000-198000000000] mSAT, ExtraOpaqueData: (lnwire.ExtraOpaqueData) { } }) )@[1-5]

And this one (the payment has been masked)

[INF] LOOP: 9ae66d Payment 9ae66d63f07fc2de2a57ef255606052c**** state=IN_FLIGHT, inflight_htlcs=[0-6], inflight_amt=[0-14138776000] mSAT During this period of the logs, this is the chart for the mem of the LND container, mem goes to 15gb in minutes.

image

CPU:

image

To reproduce

  1. Invoke a loop out on a channel with temporary channel failures
  2. LND CPU/MEM goes up, specially memory as it increases exponentially.

How we fix it

  1. We disable swaps on the channel that we have detected the issues
  2. Every calm downs.

System information

Versions: LND 0.17.5/0.17.4 and loop 0.27.1 for amd64 linux containers on kubernetes (AWS)

Jossec101 avatar May 17 '24 14:05 Jossec101

Thanks for the report. The logs you posted indicate that it's just trying to pay the invoice which essentially an RPC call to LND. I wonder if you can reproduce this issue without loop too? Maybe just by trying to probe with an invoice to the destination that caused the spike?

bhandras avatar May 17 '24 15:05 bhandras

Could you please also create a memory profile of the affected LND node? You can add the option: --profile=<profileport> and then use the usual Go profiling tools from the browser.

bhandras avatar May 17 '24 15:05 bhandras

I did a loop out via CLI and memory started to increase exponentially, the bad part is that after enabling profiling the channel that was triggering the leak did swap succesfully and we can't reproduce for now. It was only happening on a specific chanset. I will report back when we have another issue like this.

Jossec101 avatar May 20 '24 08:05 Jossec101

Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/

Maybe LND should be more aware, under a container of its memory limits

It seems to be working, I will close when our confidence this is solved is higher, but looks so :D

Jossec101 avatar May 29 '24 11:05 Jossec101

Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/

Maybe LND should be more aware, under a container of its memory limits

It seems to be working, I will close when our confidence this is solved is higher, but looks so :D

Thanks for the update @Jossec101 🙏

bhandras avatar May 29 '24 11:05 bhandras

Memory is being released properly, we can conclude it's a golang thing, I wonder if we should report this on main LND repo, i'm sure we are not the only ones suffering this.

Jossec101 avatar Jun 02 '24 20:06 Jossec101

Yes seems like it is related to LND, so better to track through LND's issues.

bhandras avatar Jun 03 '24 11:06 bhandras