lnd
lnd copied to clipboard
`lncli wtclient tower` uses massive amounts of memory
Background
I run two lnd instances (0.15.0-beta.rc6), one of them serving as the watchtower for the other. Aside from an issue where lnd uses lots of memory probably related to the watchtower client code, I also noticed that memory usage increases even more when I issue lncli wtclient tower
.
Currently, lnd consumes 10.9 GByte (RES) and 47.6 GByte (VIRT). The command is still running, memory usage is increasing.
This might be related to #5983. My watchtower.db
is around 23 GByte (after compaction).
Your environment
- version of
lnd
: 0.15.0-beta.rc6 - which operating system (
uname -a
on *Nix): Linux server 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux - version of
btcd
,bitcoind
, or other backend: bitcoind v23
Steps to reproduce
Run lncli wtclient tower
on a node with a configured and active remote watchtower. I think I started lnd while my watchtower was down, so that data queued up in memory. I also ran lncli wtclient tower
before.
Expected behaviour
The command completes within seconds, memory usage doesn't change a lot.
Actual behaviour
Command takes ages to complete, memory usage increases by several GByte.
My SSH connection died, but I think the command completed. Afterwards, memory usage dropped a lot: 3.2 GByte RES, 47.6 GByte VIRT.
I think this is a duplicate of https://github.com/lightningnetwork/lnd/issues/5983? Do you want to keep this one open, or the other?
Also related to https://github.com/lightningnetwork/lnd/issues/6259
The other issue is about an watchtower being offline. This issue happens with an online tower. I think these are related, but different enough.
I bumped in here to say: can confirm.
lncli wtclient towers
is taking ages right now. RAM usage went up to 37% from the usual 29%, system load is over 5 on this tiny Pi4.
Something's wrong, and I'm not even rebalancing right now.
Just timed lncli wtclient towers
for the lolz:
real 15m3.573s
user 0m0.102s
sys 0m0.045s
Even if the RPC request does not require session details (--include_sessions
), these details are included as part of ListClientSessions
. Gathering this data takes a lot of memory, as shown in the heap profile.
Inside listClientSessions
:
// We'll load the full client session since the client will need
// the CommittedUpdates and AckedUpdates on startup to resume
// committed updates and compute the highest known commit height
// for each channel.
For the RPC request, it might suffice to count the entries instead of loading the details into memory, i.e. return a slim version of ClientSession
.
Currently, the details (Channel ID, Commit Height) are collected, spanning all sessions known for the given tower (client_db.go, getClientSessionAcks):
var backupID BackupID
err := backupID.Decode(bytes.NewReader(v))
if err != nil {
return err
}
ackedUpdates[seqNum] = backupID
With the fixes in #6885 I don't see a noticable increase in RAM consumption.
Invoking time lncli wtclient towers
gives:
...
"num_sessions": 57275,
"sessions": [
]
...
real 6m57.202s
user 0m0.058s
sys 0m0.015s
~Can we keep just one of the three issues that track the same problem?~
- https://github.com/lightningnetwork/lnd/issues/6660
- https://github.com/lightningnetwork/lnd/issues/5983
- https://github.com/lightningnetwork/lnd/issues/6886
EDIT: Issues are very likely related but not the same problem.
Those are three different problems requiring three different solutions
Right, my mistake. It looked to me like at least #6886 and #6660 were caused by the acked updates being kept in memory, but according to your comment above that doesn't seem to be the case.