lnd icon indicating copy to clipboard operation
lnd copied to clipboard

channeldb: write through cache for the graph and channel state

Open bhandras opened this issue 3 years ago • 20 comments

This PR adds a generic cache to the kvdb layer as well as demonstrates its use caching the channel state and the graph (graph cache being reworked to be more specific: https://github.com/lightningnetwork/lnd/pull/5642).

With this cache we can prefetch frequently accessed buckets upon start and reduce DB operations to puts and deletes which will help with performance when LND is running on replicated remote databases (etcd, Postgres).

bhandras avatar Aug 02 '21 13:08 bhandras

Ready to remove the draft tag on this one?

Roasbeef avatar Aug 06 '21 23:08 Roasbeef

Ready to remove the draft tag on this one?

Yep, ready for a round of reviews imho. The thing I'm thinking about is if we want to speed up the payments bucket this way too. For that to work out we'd surely need to fetch on-demand and LRU evict since we don't want to cache old payments. Perhaps best to do that in a separate PR.

bhandras avatar Aug 09 '21 16:08 bhandras

Yeah, awesome PR indeed! I did a high-level code only review to get the gist of what's needed. Going to try and performance test this on mainnet to see how much memory the graph cache takes. And then maybe run another test on regtest to see what 500 channels with many payments look like.

Thanks for taking a look Oliver!

Quick note about benchmarking: If you use the Bottlepay benchmark, ideally you'd need changes in the "wip etcd performance improvements PR" (https://github.com/lightningnetwork/lnd/pull/5392). I rebased that on top of this. My benchmarks showed great performance of the graph itself, since it's all in memory, we only pay the serializaton/deserializaton cost. We could mitigate that too, although it's a bigger refactor.

Currently the main bottleneck remaining after this is the payments bucket and its deep bucketed payment-htlcs-bucket. If we don't mitigate those roundtrips the bottlepay benchmark doesn't perform well.

bhandras avatar Aug 11 '21 15:08 bhandras

Amazing PR!

TBH, I was a bit skeptical when you mentioned you were going with a more generic approach here vs a more context-specific cache for the channel graph itself, but the ultimate implementation is a lot more compact than I anticipated.

I've completed an initial review pass, and a few things popped out to me:

  1. Do we have any tangible benchmarks on a mainnet-loaded node to gauge the perf increase, as well as the memory increase trade off? I'm a bit confined about the impact of caching larger buckets like the forwarding package state and channel revocation log.

    • If it turns out to be prohibitive in practice, then at least we gain the added benefit of graph operations being mostly cached which should speed up path finding attempts (mainly the initial read at the start of each attempt).
  2. In the context of boltdb, how much faster is a vanilla read from the database (which is memory mapped) vs fetching an item from the cache?

Thanks for the thorough review @Roasbeef !!

  1. I was only testing with the bottlepay benchamrks for now and also with the changes in https://github.com/lightningnetwork/lnd/pull/5392. While it's a limited test, it shows bottlenecks pretty well. Numbers in my benchmarks were stable, around 30 TPS on my cloud machine (using etcd). This is also now fully remote, non-mixed so the numbers in that PR don't apply anymore. Now that you mentioned that the fwding package and channel revocation log can grow hug I think we need to rething the caching to be LRU design and to read on-demand (which could also help with the payments bucket)

  2. In my tests it's actually about the same or a bit faster using this cache.

bhandras avatar Aug 11 '21 16:08 bhandras

Added support for a hybrid cache which is able to read-through and write-through nested (and top level) buckets. This will alow us to spare a lot of memory for huge buckets that we don't want to cache but are still part of a partially cached structure.

bhandras avatar Aug 13 '21 17:08 bhandras

To further support my argument above, I've implemented a rough version of channel caching on the graph level: https://github.com/lightningnetwork/lnd/pull/5631

joostjager avatar Aug 15 '21 13:08 joostjager

Adding this generic kv caching mechanism looks to me like one more step into the kv corner. I don't think this is a good thing to do if you want to keep open the option to move away from kv at some point. Of course anything can be done given enough resources, but with this change I can see the decision to move away from kv becoming more costly and less likely.

This is a transparent layer that's simple and a win for all kv backend implementations. Including Postgres since it reduces the amount of unnecessary roundtrips for state that is needed on the hot paths. If you take a look at the integration part it is essentially a few lines added the channeldb/db.go the rest is syntax sugar and some very minimal refactoring.

I think that a simple caching mechanism on the graph level is more appropriate and flexible here. I've tried something myself here earlier #5366 (comment) that could be used as a starting point.

Sure, I don't disagree with that. I too understand it may be much better to have higher level caching of state where needed, but I think a generic approach that reduces roundtrips (where we don't bloat the sate) is a good way to bridge while we spend more time on the design phase. The channeldb package is probably (one of) the oldest part of LND and in many ways stable and a dependency to everything we do. Gradual improvements with minimal disruption may be helpful. In my measurements apart from how the graph is accessed key performance issues currently are the channel state (we do fetch it multiple times over the lifecycle of a payment), and the payments bucket itself (htlc bucket in particular). These issues all need solutions that work or allow gradual transformation of code in a minimally disruptive way.

I haven't looked at the code of this PR in detail, but I do think that care must be taken with building too heavily on top of the bare db driver. Bugs in this code are risky and can have wide-spread consequences. In my opinion, the db driver layer should remain thin and reliable. Not too much home-brewn magic.

Feel free to have a look if you have time. I don't really understand what you mean by home-brewn magic? Perhaps a more fair criticism of something you haven't looked into is looking into pros and cons vs pointing fingers.

bhandras avatar Aug 16 '21 10:08 bhandras

Adding this generic kv caching mechanism looks to me like one more step into the kv corner. I don't think this is a good thing to do if you want to keep open the option to move away from kv at some point. Of course anything can be done given enough resources, but with this change I can see the decision to move away from kv becoming more costly and less likely.

This is a transparent layer that's simple and a win for all kv backend implementations. Including Postgres since it reduces the amount of unnecessary roundtrips for state that is needed on the hot paths. If you take a look at the integration part it is essentially a few lines added the channeldb/db.go the rest is syntax sugar and some very minimal refactoring.

Yes, I understand that it is a transparent layer and that it also benefits Postgres right now. But it is not compatible with the option to move to structured sql tables. So in that way it is increasing the cost of that move if it is desired in the future. If one PR is to be merged at this time, I'd think that it is better to opt for the one that is most widely compatible with different data storage models.

I think that a simple caching mechanism on the graph level is more appropriate and flexible here. I've tried something myself here earlier #5366 (comment) that could be used as a starting point.

Sure, I don't disagree with that. I too understand it may be much better to have higher level caching of state where needed, but I think a generic approach that reduces roundtrips (where we don't bloat the sate) is a good way to bridge while we spend more time on the design phase. The channeldb package is probably (one of) the oldest part of LND and in many ways stable and a dependency to everything we do. Gradual improvements with minimal disruption may be helpful. In my measurements apart from how the graph is accessed key performance issues currently are the channel state (we do fetch it multiple times over the lifecycle of a payment), and the payments bucket itself (htlc bucket in particular). These issues all need solutions that work or allow gradual transformation of code in a minimally disruptive way.

To me it seems that pathfinding is the only part that needs optimization at this time. Without caching, that is just unacceptably slow. But channel state, is it really a priority to optimize this now? With Postgres, I still get decent TPS without that optimization. How much exactly is the win for etcd by caching channel state?

With regards to gradual improvements to channeldb: To me it seems that extending the channeldb cache #5631 is just that.

I haven't looked at the code of this PR in detail, but I do think that care must be taken with building too heavily on top of the bare db driver. Bugs in this code are risky and can have wide-spread consequences. In my opinion, the db driver layer should remain thin and reliable. Not too much home-brewn magic.

Feel free to have a look if you have time. I don't really understand what you mean by home-brewn magic? Perhaps a more fair criticism of something you haven't looked into is looking into pros and cons vs pointing fingers.

With home-brewn magic I mean building generic layers on top of the database driver, such as this caching and also the stm and prefetch parts. I think that is a risky thing to do. There is a reason that people use proven database backends. Making modifications to that through a generic client-side layers is not so simple in my opinion, because it isn't always easy to reason through all scenarios that can occur. Especially for database code that involves transactions. It almost requires a touch of magic to get it 100% right. I consider this fair criticism. Pros and cons I had already outlined in my initial comment I believe.

joostjager avatar Aug 16 '21 11:08 joostjager

But it is not compatible with the option to move to structured sql tables.

This approach also isn't blocking anything when it comes to moving to structured tables. On the contrary, it clearly shows where such a move is most urgently needed. But because this is quite a big undertaking it is at least one or two major versions out. So if we can get some optimizations in now, we want to do that.

So in that way it is increasing the cost of that move if it is desired in the future.

That's just not true. Removing the cache is quite literally changing a few lines.

To me it seems that pathfinding is the only part that needs optimization at this time.

That is your opinion that we do not share. "Decent" performance just isn't enough for the number of users and payments LN will likely need to handle in the near future. Having a generic cache that can be turned on for specific parts of the DB can become quite valuable. Of course we're aware that this is just postponing the problems. But as I said, until we can move to a new data structure (table based approach), we might need this to give us some breathing room.

I ran some tests on this with a mainnet graph. And it looks like the serialization/deserialization of the graph related data from/to the cache is now the bottleneck. So we decided to go with a less generic approach for the graph (probably quite similar to your poc in #5631, but haven't looked at it too closely) but keep the generic cache for the channel state for now.

guggero avatar Aug 16 '21 13:08 guggero

So in that way it is increasing the cost of that move if it is desired in the future.

That's just not true. Removing the cache is quite literally changing a few lines.

Yes, but the total cost of development and review of first this PR and then later a different caching PR is higher. That is all I meant to say.

To me it seems that pathfinding is the only part that needs optimization at this time.

That is your opinion that we do not share. "Decent" performance just isn't enough for the number of users and payments LN will likely need to handle in the near future. Having a generic cache that can be turned on for specific parts of the DB can become quite valuable. Of course we're aware that this is just postponing the problems. But as I said, until we can move to a new data structure (table based approach), we might need this to give us some breathing room.

I can follow along this line of thinking. So what exactly is good enough performance in your opinion for the near future on a replicated remote db setup?

joostjager avatar Aug 16 '21 13:08 joostjager

Bugs in this code are risky and can have wide-spread consequences. In my opinion, the db driver layer should remain thin and reliable. Not too much home-brewn magic.

There isn't too much that's really fancy in this PR, just a tree traversal to prefetch and catch certain things that are very heavily read. Based on some of our initial benchmarking for the remote DB, it turned out that a lot of the network interaction (TLS decrypting+encrypting large-ish payloads) was a hot area. Due to the nested nature of the bolt structure, it's the case that a routine read of a single value may trigger a series of other reads just to get that single value. The more we can cache locally (in a consistent manner), the less we need to wait for network round trips. The round trips will also start to add up in instances where the DB server may not be closely collocated with the actual lnd server, so in addition to the normal 1.5 RTTs for a commitment update, additional latency is inserted.

In the end I don't see this approach and the one to re-use the existing channel graph cache as being at odds with each other. It's clear that by caching more locally we save time by not needing to go to the network to fetch things each time. However, we already have an existing cache that stores the channel graph which is populated as soon as we communicate with our first peer. So IMO we can get an easy win by just re-using that cache (as is prototyped in #5631, since the data is already in memory for every lnd node, saves us from storing it in to places and decoding) for the graph, and then using this approach in other hot paths that end up with us redundantly fetching data (since w/ the way the DB closures are set up we almost always need to read several keys just to write/update a single one) over the network (like the change to just cache certain sequence numbers locally to save on DB transactions).

Roasbeef avatar Aug 16 '21 22:08 Roasbeef

Some more tests incoming, but until then feel free to add any new comments you might have.

bhandras avatar Aug 26 '21 16:08 bhandras

Reworked the PR a bit to help getting https://github.com/lightningnetwork/lnd/pull/5642 in faster. Now the first 4 commits are pure refactors we likely will move into #5642 and the last 3 is the cache + channel state caching.

bhandras avatar Aug 27 '21 16:08 bhandras

Can now be rebased on top of #5642.

guggero avatar Aug 30 '21 11:08 guggero

Only the last 5 commits are relevant now since some commits have been moved to the in-mem graph PR.

bhandras avatar Sep 01 '21 15:09 bhandras

I'm wondering that given the recent DB changes maybe we could drop this? @Roasbeef Happy to rebase/rework if we still think this PR adds value for 0.16.

bhandras avatar Jun 22 '22 16:06 bhandras

@bhandras drop as in close? Has been a while since I've looked at this, but I think since then we've gone w/ the in-memory route for the graph which gave us a nice speed up and a reduction in the number of round trips. I think as we start to do more SQL specific stuff and add instrumentation to see on which operations we're waiting the longest on (which might still be in KV land) this could be useful in adding more caching to minimize network round trips there. So maybe it's an 0.17 thing? It's also been a while since I've run the bottlepay bechmarking stuff as well.

Roasbeef avatar Jun 23 '22 22:06 Roasbeef

So this might be a bit more relevant now, based on some of the profiles gathered in this issue: https://github.com/lightningnetwork/lnd/pull/6683.

TL;DR: in the wild a user's script ends up fetching the channel state a bunch, this makes everything else slower as either operations are blocked on this, or other transactions are held up (?) due to the constant queries.

Roasbeef avatar Jul 12 '22 00:07 Roasbeef

So this might be a bit more relevant now, based on some of the profiles gathered in this issue: #6683.

TL;DR: in the wild a user's script ends up fetching the channel state a bunch, this makes everything else slower as either operations are blocked on this, or other transactions are held up (?) due to the constant queries.

Happy to resurrect the PR for 0.16 if you think it's relevant. I think the issue reference is wrong, as it points to the payment lifecycle refactor, but in general this change list is more relevant for the case when using a remote database, since bolt will mmap most of it anyway (so not much to gain with bbolt).

bhandras avatar Aug 09 '22 10:08 bhandras

@bhandras, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Sep 13 '22 13:09 lightninglabs-deploy

@bhandras, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Nov 15 '22 18:11 lightninglabs-deploy

@bhandras, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Jul 25 '23 11:07 lightninglabs-deploy

Closing due to inactivity

lightninglabs-deploy avatar Jul 28 '23 13:07 lightninglabs-deploy

Closing due to inactivity

lightninglabs-deploy avatar Jul 28 '23 14:07 lightninglabs-deploy

Closing due to inactivity

lightninglabs-deploy avatar Jul 28 '23 15:07 lightninglabs-deploy