lnd
lnd copied to clipboard
pathfinding: probability for bimodal distribution
This draft PR implements a probability variant based on the formalism introduced by @renepickhardt as a follow-up from bimodal liquidity distribution research see #5988. No in-depth review yet, looking for high-level feedback on configuration, see below.
In this PR I introduce an Estimator interface for probability estimation. The two systems (apriori
and bimodal
) have different configuration parameters but also share some (time decay parameter, weight of other channel result inclusion). The question is how the specific configuration parameters should be exposed to the user (right now I use a shared config). We could introduce a similar structure as with the chain backend, to distinguish different probability sources and to add specific documentation. This way we could configure both approaches and allow for A/B testing in the future.
Todo:
- [x] configuration
- [x] algo needs documentation
- [ ] mission control configuration via RPC
- [ ] benchmark comparison of the current and bimodal approaches
- [x]
QueryRoutes
andQueryProbability
need to be supplemented with channel capacities - [ ] update sample-lnd.conf
- [ ] release notes
A lot of changes to the current probability estimator. It's not easy to see what the actual effect on payment success will be. Left a few comments, but for me it is hard to be certain that the change set is an improvement.
Thank you a lot for the feedback! Just to clarify, the approach would be to provide an alternative opt-in implementation. Right, without a benchmark we won't know if there's an improvement, however, there are some limits in the math that make me confident there will be improvements (for example, if the algo learns that there's a success for a 100ksat payment, it will also favor that pair for 200k sat). In the next I'll focus on establishing a benchmark method.
An alternative approach would be to make a much smaller incremental step and use most of the existing algorithm with the addition of penalizing amounts close to the channel capacity harder.
I was thinking about that and wanted to do that after this PR, but I agree, it would be a quicker way to benefit users if we split this PR and introduce such a penalty first. I would take the approach of the first four commits and build on top of that. Passing the capacity as a parameter is due to performance reasons (haven't compared to looking up capacity from the graph). In the long run, I think we may need to restructure mission control to provide the capacity and other graph-derived data efficiently (maintained by ChannelRouter
).
An alternative approach would be to make a much smaller incremental step and use most of the existing algorithm with the addition of penalizing amounts close to the channel capacity harder. I would take the approach of the first four commits and build on top of that.
Yeah FWIW doing something like that isn't mutually exclusive with this PR (can even be done concurrently with just those first few commits). In fact, the work in this PR makes adding modified estimators much easier.
Added a suggestion for how to handle configuration. I added subgroups for the different estimators, which is a breaking change for the apriori parameters.
Shares initial commits with #6857, so if questions come up, would appreciate review comments in that PR concerning these commits.
Thanks for the reviews! Rebased on https://github.com/lightningnetwork/lnd/pull/6857. Probability estimators are now swappable at run time to allow for custom A/B testing. I probably need to remove EstimatorConfig
and just incorporate config fields directly into the probability estimator implementations, as I don't see a way around the type switches for configuration.
There are some performance issues, where a pathfinding attempt would take ~0.5s compared to ~0.1s (apriori) on my machine, which needs investigation. Probably calling math.Exp
adds to computation time and perhaps can be replaced with a more efficient approximation.
There are some performance issues, where a pathfinding attempt would take ~0.5s compared to ~0.1s (apriori) on my machine, which needs investigation. Probably calling math.Exp adds to computation time and perhaps can be replaced with a more efficient approximation.
One approach would be to make a benchmark, then expose profile capture during the benchmark. This way, we'd be able to see exactly where things are slower and if we need any memoization or w/e. You could also just try to do path finding attempts locally on the CLI, then capture a profile right before you execute the command to see where things are slowing down.
Can be rebased now that the dependent PR has been merged!
@bitromortac, remember to re-request review from reviewers when ready
@bitromortac
bimodal: CRTR: Pathfinding perf metrics: nodes=6144, edges=76629, time=414.116737ms apriori: CRTR: Pathfinding perf metrics: nodes=6143, edges=76629, time=600.080587ms So with this implementation it takes ~50% more time (without optimizations).
Unless the entries in the top two lines were inadverntly swapped, then if I'm reading this correctly, the bimodal is actually faster than the apriori?
As discussed offline, it may be worth adding clear steps to the PR description explaining how users can do a real life side-by-side comparison of both estimators with a "mission control state restore" in between.
As discussed offline, it may be worth adding clear steps to the PR description explaining how users can do a real life side-by-side comparison of both estimators with a "mission control state restore" in between.
Sounds like a good candidate for a blog post or lnd mailing list post. Either of those would be easier to consume/distribute compared to a PR description.
I've looked up XImportMissionControl
and traced towards importSnapshot
:
// importSnapshot takes an existing snapshot and merges it with our current
// state if the result provided are fresher than our current results. It returns
// the number of pairs that were used.
Not sure if this is usable for a side-by-side comparison, because it merges mission control state rather than replacing it.
I think it is worth spending some extra time on figuring this out and documenting it somewhere to really unlock the value of this PR for users. Otherwise it might be yet another set of pathfinding config flags for which users can't really decide what to do with.
Not sure if this is usable for a side-by-side comparison, because it merges mission control state rather than replacing it.
Possible to first do a reset, then a merge.
I think it is worth spending some extra time on figuring this out and documenting it somewhere to really unlock the value of this PR for users.
I think the value is unlocked right after the merge. We can update the PR description later in a non-blocking manner. As mentioned a PR description isn't as searchable/indexable as a mailing list post or blog post.
Pushed some minor changes mostly related to API backward compatibility. I updated the PR description to give some more context and included a hint on how one could measure differences between the two estimators (exactly, with a reset of mc in between).