lnd
lnd copied to clipboard
watchtower: reduce AckedUpdate storage footprint
This PR contains a few migrations of the watchtower client DB.
Main Achievements:
- It greatly reduces the storage requirements of the session
AckedUpdates
through the use of a newRangeIndex
model. - The various indexes added in this PR nicely sets us up for a follow up PR in which we we will be able to determine which sessions we can delete (and tell the tower server that it can delete) based on if all channels that the session has updates for have been deleted. (Ie, this is a step towards #6259, #7035)
Change Log:
1. Migrate ChanDetails Bucket
A ChannelDetails
bucket is added and each registered channel gets its own sub-bucket in this bucket. The existing channel summaries are moved here under the cChannelSummary
key and the old top-level cChanSummaryBkt
is deleted. This is done so that we can store more details about a channel without needing to add new info to the encoded ChannelSummary.
2. Add a new Channel-ID index
A new channel-ID index is added. The index holds a mapping from db-assigned-ID (8 bytes) to real channel-ID (32 bytes). This mapping will allow us to preserve disk space in future when persisting references to channels.
3. Add a new Session-ID index
As for 2, the same is done for session IDs (33 bytes)
4. Migrate AckedUpdates to the RangeIndex model.
This is the main meat of the PR.
Currently, for each session, we store its acked updates as follows:
seqNum (2 bytes) -> encoded BackupID{channelID (32 bytes) + commitHeight (8 bytes)}
The default number of backups per session is 1024 and so this means that regardless of the number of channels the session is actually covering, it takes 1024 * (2 + 32 + 8) bytes per session. Depending on how busy your node is, you could have many thousands of sessions (this user has 60k which would mean 2.5GB!).
The only reason that we want to keep track of AckedUpdates imo is if we need to determine which updates of a channel have/have not been backed up or if we want to replay backups to a different tower. So I argue that knowing the sequence number associated with each back up is not necessary.
Given this assumption, we can store the AckedUpdates
using a RangeIndex
(per channel). We will also make use of the new channel index described above (in point 2 of the change log) so that we only need 8 bytes per channel.
The RangeIndex
is just a map from start height to end height of a range. This means that in the best case where all commit heights are present, we only need to store the start and end height. And if there are holes, we store separate ranges and that will still be more efficient since each range only requires 16 bytes (8 byte key for the start value and 8 bytes for the end value of the range). Our session can thus store its AckedUpdates as follows:
"acked-updates" => channel-ID (8 byte key sub-bucket) => start height (8 byte key) -> end-height (8 byte value)
Here is a quick example showing the the change of an index that starts empty and then gets the numbers 1, 2, 4, 3:
add 1:
1 -> 1
add 2:
1 -> 2
add 4:
1 -> 2
4 -> 4
add 3:
1 -> 4
This means that every session only refers to a channel once and then just stores the index of ranges for that channel that is covered by the session.
Any additions to the index will at most require one value update & one key deletion.
The PR contains the necessary logic to make sure that the calculation of how to update an index given a new commit height is as efficient as possible.
Note:
I originally had the changes in #7059 here too since the bug in wtclientrpc makes it difficult to use the rpc to check the correctness of this PR. Separated it into its own PR now just to make the diff of this PR a bit smaller. But suggest we get that PR in first even though the two are not dependant on each other.
Fixes https://github.com/lightningnetwork/lnd/issues/6886
- We'll need to properly test these persistent changes to ensure they don't lead to a performance regression in practice. In particular I'm curious to see how this fares in a remote db setting, given there seems to be a few additional layers of bucket nesting (which can mean additional round trips w/ the current kv mapping)
I think it's quite easy to work around those since we can cache a range index and then there's no need for nesting down anymore after that if done properly, see: https://github.com/lightningnetwork/lnd/pull/7055#discussion_r1004730723
@ellemouton, remember to re-request review from reviewers when ready
I had a look at the migrations and the general approach, and I think this is a great idea. Even though some ranges might end up being tiny (in weird multi-WT scenarios), this shouldn't be an issue when compared to the status quo. Instead of iterating through the ranges a binary search might be better, but I don't think this is relevant for a one-off migration with just a couple of ranges (i.e. not millions).
The main migration takes quite a bit of RAM. I don't think this is a huge issue, but it might be problematic for some larger nodes (or nodes without a lot of RAM/swap).
Aside from the RAM spike, the migration worked out just fine and my node did a few forwards. I restarted again to compact the DB:
13:57:44.847 [INF] CHDB: DB compaction of /home/bitcoin/.lnd/data/graph/mainnet/wtclient.db successful, 5958950912 -> 1115611136 bytes (gain=5.34x)
Before the migration the DB had a size of 4650852352 bytes, so it's more like 4.17x. Nice, thanks a lot!
PS: With gzip I was able to bring this down to 65 MByte, so maybe there's more potential.
PS: With gzip I was able to bring this down to 65 MByte, so maybe there's more potential.
ok cool - so with gzip, the gain is about 72x? awesome! Thanks for giving this a spin @C-Otto ! should get even more gains once the DeleteSession
PR is in :)
Also - my original rough estimate for the db gains a node would get if they only had 1 channel was 1792x. So am I correct in saying that the node you migrated has over its lifetime roughly always had about 24 active channels at any given time (very rough estimate)?
For each session I checked, I see a bunch of channels - more than 30 each. But I'd say 24 active channels per sessions sounds very realistic to me!
@Crypt-iQ see my comments from 14 days ago
Let me know if this is ready for another round :telescope:
update: just got the step-by-step migration working & can confirm that it is waaaay less RAM intensive. Just need to clean up the code still so will hopefully push the updates by EOD tomorrow
Report
I wrote a script here that allows you to quickly create a large wtclient.db
file. It attempts to always have about 20 active channels at any given point.
Using this script, I created a wtclient.db
file of 3GB. It had 25756 sessions with updates across a total of 125 channels (with around 20 being active at any given time)
End result:
After migrating this DB and the compacting it, I ended up with a db of size 185MB (gain of 16x).
NOTE: I found that first migrating the DB and then compacting it was significantly faster than first compacting and the migrating and compacting again. The only down side here is that if you migrate first, your db size will grow slightly (in my example, it grew to 3.2GB).
Migrating using multiple Txs vs 1 big TX
The migration has been split up into multiple transactions so that the migration is less RAM intensive. Using my 3GB example DB, I preformed the migration using both methods. With the all-in-one-tx method, the average RAM usage was around 185MB and with the multiple-txs method (which migrates 5000 sessions per tx), the average RAM usage was around 12800 KB.
Note that when @C-otto tried the migration (all-in-one) tx with his ~5GB db, the average RAM usabel was around 3GB but it is worth nothing that the encoding of certain items has changed since then so should be less now. @c-otto, if you still have a copy of your previous DB around, it would be awesome to hear how this new multi-tx approach performs on it if you are willing to try it on a copy of that copy :)
cc @Crypt-iQ & @bhandras
Thanks, @ellemouton! I don't think I have the old DB anymore, but we'll get plenty of reports once this is released :)
@bhandras: review reminder @roasbeef: review reminder @crypt-iq: review reminder @ellemouton, remember to re-request review from reviewers when ready
Thanks so much for the reviews @Crypt-iQ & @bhandras !!! 🚀
@Crypt-iQ - I fixed the nits you found and left comments re your question about if the first 2 migrations are intensive.