lightning-kmp icon indicating copy to clipboard operation
lightning-kmp copied to clipboard

More detailed error when HTLC could not be sent

Open t-bast opened this issue 10 months ago • 4 comments

We use more detailed errors when we fail to send an HTLC, since this may happen because the channel is offline, syncing, opening or closing.

It may also happen that we have a channel in the Normal state that has enough funds, but we cannot send the HTLC because it hits a channel limit (too many HTLCs pending, a splice is ongoing, etc). When that happens we will retry ignoring the channel, which leads to the default NoAvailableChannels error: in that case, the "real" error can be found in the failures list of the OutgoingPaymentFailure instance.

We now explicitly map every failure to a potentially user-friendly error (PartFailure), whenever possible. When the error cannot be easily interpreted, we use a PartFailure.Uninterpretable error to wrap the actual error. Applications like Phoenix should define localized error messages for each of these errors and print them on failures.

:warning: We change the format of LightningOutgoingPayment.Part.Status.Failed, which is stored in the DB. The migration should be handled by:

  • adding a new column error_code and a mapping from PartFailure to an error code
  • for PartFailure.Uninterpretable, the existing details column should be used for the error message
  • old rows should use the error_code of PartFailure.Uninterpretable and just display the stored details message

What do you think @dpad85, is that an acceptable way of handling backwards-compat for previously failed payments?

Fixes #616

t-bast avatar May 02 '24 15:05 t-bast

What do you think @dpad85, is that an acceptable way of handling backwards-compat for previously failed payments?

For backward compat, it would work, yes.

However, these changes do not fix #616. When a payment fails, Phoenix does not display the failures for each parts. Payment parts are confusing for users, so we don't show them (except on iOS but we'll have to remove that).

Instead we show the FinalFailure of the overall LightningOutgoingPayment. This is how users can end up with the FinalFailure.NoAvailableChannels("no channels available to send payment") error message.

What would be great is a method that takes the FinalFailure + the list of PartFailure and returns a single, most pertinent type.

dpad85 avatar May 03 '24 14:05 dpad85

What would be great is a method that takes the FinalFailure + the list of PartFailure and returns a single, most pertinent type.

I did something in https://github.com/ACINQ/lightning-kmp/pull/634/commits/293234718daa98e7c7cd0bcf0debd8a9cbde602f, I'm not sure we can do better. There are always cases that won't be easy to programmatically analyze and will require the exact details of each payment attempt to figure out what went wrong, but it should work in most cases.

t-bast avatar May 03 '24 14:05 t-bast

At some point we could simplify all of this greatly by assuming that lightning-kmp is always connected to a single trampoline peer with at most one channel available for payments, but that's a much larger refactoring.

t-bast avatar May 03 '24 15:05 t-bast

Testing this PR on Phoenix, and trying to pay an already paid invoice a second time, I hoped to get a Part.Status...RecipientRejectedPayment failure.

Instead, the parent payment status is FinalFailure.RetryExhausted. The part child status is Part.Status...RecipientIsOffline (seems to have failed with a UnknownNextPeer error which maps to that failure).

FYI, the explain method in that case returns FinalFailure.RetryExhausted.

So we are still not able to provide a meaningful error message to the user, even with the improvements added in the PR.

dpad85 avatar May 06 '24 14:05 dpad85