lnd icon indicating copy to clipboard operation
lnd copied to clipboard

pathfinding: probability for bimodal distribution

Open bitromortac opened this issue 2 years ago • 3 comments

This draft PR implements a probability variant based on the formalism introduced by @renepickhardt as a follow-up from bimodal liquidity distribution research see #5988. No in-depth review yet, looking for high-level feedback on configuration, see below.

In this PR I introduce an Estimator interface for probability estimation. The two systems (apriori and bimodal) have different configuration parameters but also share some (time decay parameter, weight of other channel result inclusion). The question is how the specific configuration parameters should be exposed to the user (right now I use a shared config). We could introduce a similar structure as with the chain backend, to distinguish different probability sources and to add specific documentation. This way we could configure both approaches and allow for A/B testing in the future.

Todo:

  • [x] configuration
  • [x] algo needs documentation
  • [ ] mission control configuration via RPC
  • [ ] benchmark comparison of the current and bimodal approaches
  • [x] QueryRoutes and QueryProbability need to be supplemented with channel capacities
  • [ ] update sample-lnd.conf
  • [ ] release notes

bitromortac avatar Aug 10 '22 15:08 bitromortac

A lot of changes to the current probability estimator. It's not easy to see what the actual effect on payment success will be. Left a few comments, but for me it is hard to be certain that the change set is an improvement.

Thank you a lot for the feedback! Just to clarify, the approach would be to provide an alternative opt-in implementation. Right, without a benchmark we won't know if there's an improvement, however, there are some limits in the math that make me confident there will be improvements (for example, if the algo learns that there's a success for a 100ksat payment, it will also favor that pair for 200k sat). In the next I'll focus on establishing a benchmark method.

An alternative approach would be to make a much smaller incremental step and use most of the existing algorithm with the addition of penalizing amounts close to the channel capacity harder.

I was thinking about that and wanted to do that after this PR, but I agree, it would be a quicker way to benefit users if we split this PR and introduce such a penalty first. I would take the approach of the first four commits and build on top of that. Passing the capacity as a parameter is due to performance reasons (haven't compared to looking up capacity from the graph). In the long run, I think we may need to restructure mission control to provide the capacity and other graph-derived data efficiently (maintained by ChannelRouter).

bitromortac avatar Aug 16 '22 07:08 bitromortac

An alternative approach would be to make a much smaller incremental step and use most of the existing algorithm with the addition of penalizing amounts close to the channel capacity harder. I would take the approach of the first four commits and build on top of that.

Yeah FWIW doing something like that isn't mutually exclusive with this PR (can even be done concurrently with just those first few commits). In fact, the work in this PR makes adding modified estimators much easier.

Roasbeef avatar Aug 16 '22 23:08 Roasbeef

Added a suggestion for how to handle configuration. I added subgroups for the different estimators, which is a breaking change for the apriori parameters.

Shares initial commits with #6857, so if questions come up, would appreciate review comments in that PR concerning these commits.

bitromortac avatar Aug 26 '22 11:08 bitromortac

Thanks for the reviews! Rebased on https://github.com/lightningnetwork/lnd/pull/6857. Probability estimators are now swappable at run time to allow for custom A/B testing. I probably need to remove EstimatorConfig and just incorporate config fields directly into the probability estimator implementations, as I don't see a way around the type switches for configuration. There are some performance issues, where a pathfinding attempt would take ~0.5s compared to ~0.1s (apriori) on my machine, which needs investigation. Probably calling math.Exp adds to computation time and perhaps can be replaced with a more efficient approximation.

bitromortac avatar Nov 30 '22 17:11 bitromortac

There are some performance issues, where a pathfinding attempt would take ~0.5s compared to ~0.1s (apriori) on my machine, which needs investigation. Probably calling math.Exp adds to computation time and perhaps can be replaced with a more efficient approximation.

One approach would be to make a benchmark, then expose profile capture during the benchmark. This way, we'd be able to see exactly where things are slower and if we need any memoization or w/e. You could also just try to do path finding attempts locally on the CLI, then capture a profile right before you execute the command to see where things are slowing down.

Roasbeef avatar Dec 07 '22 04:12 Roasbeef

Can be rebased now that the dependent PR has been merged!

Roasbeef avatar Dec 13 '22 03:12 Roasbeef

@bitromortac, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Jan 20 '23 17:01 lightninglabs-deploy

@bitromortac

bimodal: CRTR: Pathfinding perf metrics: nodes=6144, edges=76629, time=414.116737ms apriori: CRTR: Pathfinding perf metrics: nodes=6143, edges=76629, time=600.080587ms So with this implementation it takes ~50% more time (without optimizations).

Unless the entries in the top two lines were inadverntly swapped, then if I'm reading this correctly, the bimodal is actually faster than the apriori?

Roasbeef avatar Feb 02 '23 02:02 Roasbeef

As discussed offline, it may be worth adding clear steps to the PR description explaining how users can do a real life side-by-side comparison of both estimators with a "mission control state restore" in between.

joostjager avatar Feb 10 '23 16:02 joostjager

As discussed offline, it may be worth adding clear steps to the PR description explaining how users can do a real life side-by-side comparison of both estimators with a "mission control state restore" in between.

Sounds like a good candidate for a blog post or lnd mailing list post. Either of those would be easier to consume/distribute compared to a PR description.

Roasbeef avatar Feb 11 '23 00:02 Roasbeef

I've looked up XImportMissionControl and traced towards importSnapshot:

// importSnapshot takes an existing snapshot and merges it with our current
// state if the result provided are fresher than our current results. It returns
// the number of pairs that were used.

Not sure if this is usable for a side-by-side comparison, because it merges mission control state rather than replacing it.

I think it is worth spending some extra time on figuring this out and documenting it somewhere to really unlock the value of this PR for users. Otherwise it might be yet another set of pathfinding config flags for which users can't really decide what to do with.

joostjager avatar Feb 13 '23 12:02 joostjager

Not sure if this is usable for a side-by-side comparison, because it merges mission control state rather than replacing it.

Possible to first do a reset, then a merge.

I think it is worth spending some extra time on figuring this out and documenting it somewhere to really unlock the value of this PR for users.

I think the value is unlocked right after the merge. We can update the PR description later in a non-blocking manner. As mentioned a PR description isn't as searchable/indexable as a mailing list post or blog post.

Roasbeef avatar Feb 14 '23 00:02 Roasbeef

Pushed some minor changes mostly related to API backward compatibility. I updated the PR description to give some more context and included a hint on how one could measure differences between the two estimators (exactly, with a reset of mc in between).

bitromortac avatar Feb 14 '23 14:02 bitromortac