loop
loop copied to clipboard
Potential LND Memleak by loop out
Hey, we are experiencing disruptions in our service by some swap payments that increase cpu and mem exhausting resources and making LND and the machine that hosts it unusable.
This has happened many times but we have not reported it until we were more confident of the issue. Since this is coming from Loop, we decided to post the issue here.
Expected behavior
Swaps cannot make the system go down.
Actual behavior
We trigger a loop out via our custom liquidity software but we can correlate one swap to make LND unusable as it get DOS'd trying to pay a loop invoice through a faulty channel
For example, these two logs are printed heavily when LND starts to drain resources(the values are in range pattern [Min,Max]):
[WRN] CRTR: Attempt [2015005-2017524] for payment 9ae66d63f07fc2de2a57ef255606052c**** failed: TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(*)({ Signature: (lnwire.Sig) { bytes: ([64]uint8) (len=64 cap=64) { 00000000 *| }, sigType: (lnwire.sigType) 0 }, ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, ShortChannelID: (lnwire.ShortChannelID) [613351-843624]:[123-5457]:[0-18], Timestamp: (uint32) [1690812878-1715946314], MessageFlags: (lnwire.ChanUpdateMsgFlags) 00000001, ChannelFlags: (lnwire.ChanUpdateChanFlags) [00000000-00000001], TimeLockDelta: (uint16) [30-144], HtlcMinimumMsat: (lnwire.MilliSatoshi) [0-1000000] mSAT, BaseFee: (uint32) [0-200000], FeeRate: (uint32) [0-5000], HtlcMaximumMsat: (lnwire.MilliSatoshi) [2970000000-198000000000] mSAT, ExtraOpaqueData: (lnwire.ExtraOpaqueData) { } }) )@[1-5]
And this one (the payment has been masked)
[INF] LOOP: 9ae66d Payment 9ae66d63f07fc2de2a57ef255606052c**** state=IN_FLIGHT, inflight_htlcs=[0-6], inflight_amt=[0-14138776000] mSAT
During this period of the logs, this is the chart for the mem of the LND container, mem goes to 15gb in minutes.
CPU:
To reproduce
- Invoke a loop out on a channel with temporary channel failures
- LND CPU/MEM goes up, specially memory as it increases exponentially.
How we fix it
- We disable swaps on the channel that we have detected the issues
- Every calm downs.
System information
Versions: LND 0.17.5/0.17.4 and loop 0.27.1 for amd64 linux containers on kubernetes (AWS)
Thanks for the report. The logs you posted indicate that it's just trying to pay the invoice which essentially an RPC call to LND. I wonder if you can reproduce this issue without loop too? Maybe just by trying to probe with an invoice to the destination that caused the spike?
Could you please also create a memory profile of the affected LND node?
You can add the option: --profile=<profileport> and then use the usual Go profiling tools from the browser.
I did a loop out via CLI and memory started to increase exponentially, the bad part is that after enabling profiling the channel that was triggering the leak did swap succesfully and we can't reproduce for now. It was only happening on a specific chanset. I will report back when we have another issue like this.
Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/
Maybe LND should be more aware, under a container of its memory limits
It seems to be working, I will close when our confidence this is solved is higher, but looks so :D
Ok, I have more news, we managed to dump the heap and memory was right, it seems to be a problem of go with kubernetes memory limits. More info can be read here: https://kupczynski.info/posts/go-container-aware/
Maybe LND should be more aware, under a container of its memory limits
It seems to be working, I will close when our confidence this is solved is higher, but looks so :D
Thanks for the update @Jossec101 🙏
Memory is being released properly, we can conclude it's a golang thing, I wonder if we should report this on main LND repo, i'm sure we are not the only ones suffering this.
Yes seems like it is related to LND, so better to track through LND's issues.