clboss Channel proposal investigation by improvement to shortest path to payment target

By @AutonomousOrganization here: https://github.com/ZmnSCPxj/clboss/issues/2#issuecomment-727075769

Seems the most important information from user initiated pays is which node is the payment target. Most simple action to maximize successful payments to this node would be a big direct channel, but user may not desire network knowing payment target. However user would appreciate future pays to target having a higher likelihood of success. What about including a 'desire to be close to target' metric into the channel create and score bonus points for multiplying possible routes to targeted pay nodes?

As a basic algorithm, something like:

For each payment target:
- getroute (with fuzzpercent=0) from us to payment target, measure length of route, store in table [payment_target] -> route_length.
For each channel candidate:
- For each payment target:
  - getroute (with fuzzpercent=0) from candidate to payment target, measure length of route + 1 (or just set 1 if canddidate == payment target).
  - if length+1 is less than the previous recorded route_length, add merit to the channel candidate.

This probably requires a rework of the channel candidate investigator.

Nov 14 '20 00:11 ZmnSCPxj

Been trying to do some analysis of the connection between the two (boss) nodes I am running. Using getroute while building up exclude, is not exhaustive but it seems reasonably interesting. On last trial found 89 unique routes between them with length ranging from 2 to 11, most commonly length 5-8. The routes hit 106 unique nodes and used 6/25 of the possible delivery channels and 20/22 of the possible first sender channels. This feels like a fairly decent 'closeness', (meshing?) between the nodes, but the success rate of pays in the 100,000 - 1,000,000sat range is still low.

What actions could be taken to improve on this mesh between nodes and presumably improve the search space and reliability of multi path payments? A channel with one of the nodes used by many of the paths would shorten many paths and presumably make them more reliable. A channel trying to connect to the dark space around unreachable receiving channels, may be worthwhile but may not because they are bad (offline, no capacity).

The more general point I want to make is that I think the most contributing factor to the probability of payment success is the breadth of the search space multipath pays can search over.

Data from searches routeCheck.txt

Nov 15 '20 21:11 ghost

A channel finder algorithm CLBOSS currently has is ChannelFinderByDistance, which performs a full Dijkstra algorithm run starting at our own node, creating a shortest path tree. It then proposes nodes on the leaves of the tree for preinvestigation.

Preinvestigation is performed by ChannelCandidatePreinvestigator. When nodes are proposed, we provide the proposal node, and a patron node. The proposal is the node we want to channel to, while the patron is the peer of that node that we think is the reason why making a channel to that node is a good idea. For example, the ChannelFinderByPopularity selects a number of nodes that are popular (have lots of unique peers), sets the popular node as the patron, and proposes the peers of the popular patron as proposal.

Preinvestigation then checks each proposal node by checking if we can connect to it (if it cannot even be connected to at the time of proposal, it is probably a shit node). It also checks the "dowser flow", basically it tries to measure the channel(s) and short 2- or 3-hop routes between the proposal and the patron to see how much capacity they seem to have, if it is below 5mBTC we reject the proposal as well. The dowser flow calculation is implemented in Dowser.

For ChannelFinderByDistance, the proposal is the leaf node, while patron is the direct parent in the shortest-path tree (the patron being a proxy for "the rest of the network").

Once a proposal+patron passes the preinvestigation check, it goes into investigation handled by ChannelCandidateInvestigator, which is basically just us doing connect to the proposal nodes. Those are the nodes listed in channel_candidates array of the clboss-status command result, the id is the proposal, the status command does not show the patron though (but it is stored as well in the database).

So I think preinvestigation covers the "offline, no capacity" concern using some basic checks, so you just need to figure out a patron and proposal for every channel finder algorithm.

When we actually create channels, we pick the nodes that have been under investigation the longest with the best detected uptime, then we rerun the dowser between the proposal and its patron to use as the maximum size of channel we make to the selected proposal.

Nov 16 '20 04:11 ZmnSCPxj

Another possible proposal point would be to monitor payment attempts (which we already do over in SendpayResultMonitor, and is the reason why we even have the rpc_command hooked up that plagued the 100% CPU problems way back in the misty prehistory of CLBOSS a.k.a. last week), and check for temporary channel failures. If a sendpay has a temporary channel failure (usually a proxy for "out of capacity"), then that is weak evidence that the node just after the failing channel might be suffering from too little capacity.

On the other hand, at the lower-level, the pay command uses sendonion and not sendpay. The rpc_command hook lets us look at the parameters of RPC commands and which command was invoked, but not anything else. Unfortunately for us, sendonion does not receive the entire route, just the first hop. While pay knows the route, it does not share the information through the RPCs it makes, and in all likelihood, the node will be using pay for all payments.

On the other other hand, pay does use getroute with an excludes parameter. Channels listed in excludes imply that pay thinks those channels are not helpful for reaching the destination. And what pay accepts in excludes is really a channel-direction, meaning we know which direction of the channel failed to get funds to the recipient, and the particular channel-direction points to a specific node on that channel as the destination, meaning we know which node was unable to receive funds. getroute also includes the recipient in its parameters, so we can assocaite the failures with a specific payment, and count all getroute to a single destination as a single demerit for all channel-directions listed in excludes.

Using this lets us gather data from "normal" node activity (aka payments), without triggering a lot of CPU consumption from lightningd in analyzing the routemap.

Nov 16 '20 16:11 ZmnSCPxj