Unable to pay extreme multipart payment in tests
As a follow up to issue #7563, I have tested renepay on the same topology that @daywalker90 proposed
for testing getroutes. It turned out the payment failed with failed to find a feasible flow
due to a sequence of HTLC failures on a remote channel that should have had enough liquidity.
To reproduce the problem I tried the following test using only sendpay.
The payment flow is: l1 -> l2 -> l3, where there are more than one channels connecting l1->l2
and l2->l3.
When trying to send a 4 part payment with the following routes
- part 1 with 350k in the route
l1->l2->l3over a couple of channels with capacity 400k, - part 2 with 250k in the route
l1->l2->l3over a couple of channels with capacity 300k, - part 3 with 150k in the route
l1->l2->l3over a couple of channels with capacity 200k, - part 4 with 50k in the route
l1->l2->l3over a couple of channels with capacity 100k.
One or more payment parts always fail at the l2->l3 hop with CHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED seen
at l2 logs.
Another simpler case that fails as well is:
- part 1 with 200k in the route
l1->l2->l3over a couple of channels with capacity 400k, - part 2 with 200k in the route
l1->l2->l3over a couple of channels with capacity 300k.
If instead I try making a single part payment:
- 350k in the route
l1->l2->l3over a couple of channels with capacity 400k. The attempt succeeds.
def get_local_channel_by_id(node, chanid):
peerchannels = node.rpc.listpeerchannels()['channels']
if not peerchannels:
return None
for c in peerchannels:
if c['channel_id']==chanid:
return c
return None
def start_channels(connections):
nodes = list()
for src, dst, fundamount in connections:
nodes.append(src)
nodes.append(dst)
src.rpc.connect(dst.info["id"], "localhost", dst.port)
bitcoind = nodes[0].bitcoin
# If we got here, we want to fund channels
for src, dst, fundamount in connections:
addr = src.rpc.newaddr()["bech32"]
bitcoind.rpc.sendtoaddress(addr, (fundamount + 1000000) / 10**8)
bitcoind.generate_block(1)
sync_blockheight(bitcoind, nodes)
txids = []
chan_ids = []
for src, dst, fundamount in connections:
reply = src.rpc.fundchannel(dst.info["id"], fundamount, announce=True, minconf=0)
txids.append(reply["txid"])
chan_ids.append(reply["channel_id"])
# Confirm all channels and wait for them to become usable
bitcoind.generate_block(1, wait_for_mempool=txids)
scids = []
for con, mychan_id in zip(connections, chan_ids):
src = con[0]
wait_for(lambda: get_local_channel_by_id(src, mychan_id)['state'] == "CHANNELD_NORMAL")
scids.append(get_local_channel_by_id(src, mychan_id)['short_channel_id'])
# Make sure they have all seen block so they don't complain about
# the coming gossip messages
sync_blockheight(bitcoind, nodes)
bitcoind.generate_block(5)
# Make sure everyone sees all channels, all other nodes
for n in nodes:
for scid in scids:
n.wait_channel_active(scid)
# Make sure we have all node announcements, too
for n in nodes:
for n2 in nodes:
wait_for(
lambda: "alias" in only_one(n.rpc.listnodes(n2.info["id"])["nodes"])
)
return chan_ids
def test_channel_reserve(node_factory):
def direction(node1, node2):
return 0 if node1.info['id']<node2.info['id'] else 1
opts = {"disable-mpp": None, "fee-base": 0, "fee-per-satoshi": 0, "cltv-delta": 6}
l1, l2, l3 = node_factory.get_nodes(3, opts=opts)
chan_ids = start_channels(
[
(l1, l2, 400_000),
(l2, l3, 400_000),
(l1, l2, 300_000),
(l2, l3, 300_000),
(l1, l2, 200_000),
(l2, l3, 200_000),
(l1, l2, 100_000),
(l2, l3, 100_000),
]
)
# we should be able to send these four parts:
# nparts = 4
# route_amounts = ["350000sat", "250000sat", "150000sat", "50000sat"]
# instead the test fails even with these:
nparts = 2
route_amounts = ["201000sat", "200000sat", "150000sat", "50000sat"]
total_msat = sum( [ Millisatoshi(a) for a in route_amounts[:nparts] ])
# Test succeeds if we are able to pay this invoice
inv = l3.rpc.call("invoice", {"amount_msat": total_msat,
"label":"inv","description":"inv","cltv":10})
inv_decode = l1.rpc.decode(inv["bolt11"])
# Shared data by every route we will construct: l1->l2->l3
route = [{"id": l2.info["id"],
"direction": direction(l1,l2),
"delay": 16,
"style": "tlv"},
{"id": l3.info["id"],
"direction": direction(l2,l3),
"delay": 10,
"style": "tlv"}]
# Send every part with sendpay
for part in range(nparts):
this_part_msat = Millisatoshi(route_amounts[part])
chan1 = get_local_channel_by_id(l1, chan_ids[part*2])
chan2 = get_local_channel_by_id(l2, chan_ids[part*2+1])
route[0]["channel"] = chan1["short_channel_id"]
route[1]["channel"] = chan2["short_channel_id"]
route[0]["amount_msat"] = route[1]["amount_msat"] = this_part_msat
assert chan1["spendable_msat"]>=this_part_msat
assert chan2["spendable_msat"]>=this_part_msat
l1.rpc.call(
"sendpay",
{"route": route,
"payment_hash": inv["payment_hash"],
"payment_secret": inv["payment_secret"],
"amount_msat": total_msat,
"groupid": 1,
"partid": part+1,},
)
l1.wait_for_htlcs()
# Are we happy?
waitsendpay_response = l1.rpc.call("waitsendpay",{"payment_hash":
inv["payment_hash"], "partid": 1, "groupid": 1})
receipt = only_one(l3.rpc.listinvoices("inv")["invoices"])
assert receipt["status"] == "paid"
assert receipt["amount_received_msat"] == total_msat
I think the problem is with the parallel channels. On l2 channeld is only seeing one channel with his peer l1.
I can see that lightningd's peer structure has a list of channels (defined in lightningd/peer_control.h), but it seems that channeld's peer structure has only one channel (defined in channeld/channeld.c). Keep looking...
UPD: There's one channeld instance for every channel. So maybe lightningd is sending the HTLC request to the wrong subdaemon?
As a matter of fact in the logs:
035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 0 amount=501000000msat cltv=119 gave CHANNEL_ERR_ADD_OK
...
035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 1 amount=500000000msat cltv=119 gave CHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED
both HTLC requests go to the same channeld-chan#2, one should have gone to channeld-chan#2 and the other to channeld-chan#4.
There is a channel selection in lightningd before forwarding: https://github.com/ElementsProject/lightning/blob/9d88ce3b592ca42a01104758313b9b2806d40230/lightningd/peer_htlcs.c#L1215
Is that a good idea, why wouldn't we try to forward right on the channels we have been requested to?
Clearly in this case that function is not selecting the best channel.
Though non-strict forwarding is allowed: https://github.com/lightning/bolts/blob/master/04-onion-routing.md#non-strict-forwarding