rust-lightning icon indicating copy to clipboard operation
rust-lightning copied to clipboard

MPP routing appears to create significantly more shards than is required

Open MaxFangX opened this issue 9 months ago • 2 comments

Creating a separate issue to continue the discussion from https://github.com/lightningdevkit/rust-lightning/pull/3707#issuecomment-2786758263. This may be fixed by https://github.com/lightningdevkit/rust-lightning/pull/3707 but I'm not sure.

Summary

We're currently dealing with issues in prod where sending a MPP payment which requires only two of the sender's outbound channels (i.e. two MPP shards) is actually producing a MPP route with seven shards. With a per-path success rate of 85% (our observed probing success rate), the overall payment success rate drops to a measly 0.85^7 = 32%.

Logs

Hey @MaxFangX, this PR should reduce the number of shards in some cases but I'm not sure that it fully accounts for 7 shards vs 2... Let me know if you have logs for this case or any way to reproduce 👀

I do have logs for this case actually! They were a pain to get, but the basic setup (on mainnet) was:

  • Have multiple channels (5), three of them have low balance, and two of them have some balance, say 40k sats and 20k sats.
  • Send an amount that requires MPP, say 50k sats, from Lexe to an external wallet (in this case we tested with Breez)
  • The resulting route has a ton of shards (7).

The logs are a bit hard to read, but I was able to capture the route-finding output, and if you ctrl+f for "Got route" you can see the 7-shard route that resulted from LDK pathfinding: https://gist.github.com/MaxFangX/cfb32ca091828ea27d2acd6e2be4bf66

Unfortunately I don't have a test or anything; this is behavior that we've only discovered in prod. There's a lot of work we have planned make this more easily debuggable.

@valentinewallace

MaxFangX avatar Apr 08 '25 19:04 MaxFangX

Update: In the process of deploying Matt's suggestion to tweak PaymentParameters::max_channel_saturation_power_of_half (we set this to 0). Will report back if this appears to fix the issue.

Edit: It appears to help, but currently lacking diagnostic info to confirm. Edit 2: I have confirmed that our production routing is using the minimum sufficient number of shards.

MaxFangX avatar Apr 10 '25 03:04 MaxFangX

Hmm, I'm somewhat at a loss for why MPP was used here at all, honestly. max_channel_saturation_power_of_half is ignored for first- and last- hops, but the remaining hops in the first path are all at least 1M sats, so shouldn't have driven the path contribution down to 5k sats. There does seem to be a bug here somewhere, but I can't seem to spot it. A copy of the scorer would be nice, as it might allow us to directly reproduce this case. #3729 might also help.

TheBlueMatt avatar Apr 10 '25 20:04 TheBlueMatt

Can you post your full patch against the router? It looks like at a minimum your logs are different from what exists upstream (eg your log says First hop through 0314a77523d1dcbc5db56081edcbc24ab820b35e343a6c6769176de707c178d457/977483429296537601 can send between 1msat and 22930sat (inclusive) but the log was added in 136e89ebe6a74fc8c7bfba82fff4ce8a375e0e76 and hasn't been changed - the to-value should be in msat).

TheBlueMatt avatar May 14 '25 21:05 TheBlueMatt