raiden
raiden copied to clipboard
Remove dependency of non-protocol required confirmation messages (Delivered, Processed)
Abstract
Raiden protocol requires strong guarantees about delivery and acceptance (correctly processing) of most of its messages, due to strict requirements related to state consistency between parts. Initially because of limitations of the UDP protocol, Raiden added some confirmation-control messages which are not specific to the inner working of the protocol, but are required specifically [only] to control retry of the message, until they're received/acccepted or some breaking event happens (e.g. channel is closed, no need to continue retrying).
Currently, the implementation is that sender
should keep retrying sending any message until they receive a respective Delivered
or Processed
message, indicating it was correctly received or accepted by the partner (see #3087). This adds load and latency to the transport which isn't actually required for proper working of the protocol, and also requires additional processing due to signing and validation of every message, specially problematic if one have interactive signing (possibly the case for a future light client implementation, #114) or limited hardware (#3199 , #2385).
Motivation (iteration 1)
In talks with @hackaugusto , it became clearer that Delivered
and Processed
were split from an Ack
message on UDP times, which had mostly the function of current Processed
message, which is to confirm correct processing and stop retry of a given message, currently replied to BalanceProof
-containing messages. For that, we may remove Delivered
messages completely, as most of its function is already provided by Processed
messages.
Specification
Currently, Delivered
messages are used to confirm 3 messages: SecretRequest
, SecretReveal
and Processed
, and this is how we can remove the need for it:
- [ ]
SecretRequest
: simplest case, confirmed by the respectiveSecretReveal
- [ ]
SecretReveal
: although doesn't contain a balance-proof, it can be confirmed by aProcessed
message, as its content needs to be validated against the state (e.g. there's a pending lock with the given secret), so aProcessed
message should confirm it instead ofDelivered
(which means nothing on this context anyway, effectively replacing Delivered for Processed for SecretReveal). - [ ]
Processed
: become a one-shot message, as currently isDelivered
, meaning it isn't retried, but sent only once as a response to some other message. One-shot messages doesn't need confirmation (therefore, noDelivered
required), but also needs to handle duplicate/retried messages coming.- For already-accepted BalanceProof-containing messages/state changes, it's not going to get processed again, so if the
nonce
is smaller than current one, a new (one-shot, even if )Processed
message is generated and sent anyway. - For
SecretReveal
message, as proposed above, if there's a pending or unlocked lock with the given hash, aProcessed
should be sent to confirm it
- For already-accepted BalanceProof-containing messages/state changes, it's not going to get processed again, so if the
This proposal also supersedes #2383 and #2505 . May be related to #490 (as per receiver by-nonce queue and caching of responses).
Backwards Compatibility
The Delivered
messages could still be sent for backwards compatibility, and some handling of it could still be done for the SecretReveal
confirmation coming from old clients. Nonetheless, this could be a good opportunity to improve protocol by breaking compatibility, possibly also doing #3283 before next release.
Timeboxing
Investigation and split into more detailed issues or implementation timeboxed to 2 PD
On last bullet point, about RefundTransfer
, maybe we can rely on the SecretReveal
=> Secret/Unlock
from a mediator/initiator to next mediator on the failed routes with refunded transfers to stop retrying, removing the need for Processed
on it too (same logic for normal reveal/unlock apply then). To investigate.
We have discussed this in the past multiple times, in my opinion it is a bad idea. My comments on the proposal:
Regarding the abstract
This adds load and latency to the transport which isn't actually required for proper working of the protocol
I believe the claim about latency here is wrong (I agree with the load though).
From the perspective of a single transfer: There is no waiting involved with either Delivered or Processed messages. These messages do not contain a balance proof, and therefore don't need to be ordered, meaning they can be sent as soon as it is appropriate ( == no waiting/latency). Additionally, these messages are sent backwards in respect to the protocol messages, to a different recipient, meaning these messages are sent in parallel to the protocol messages.
From the perspective of a single node: In this case, I actually think your proposal will add latency instead of removing it. Assume that you have a channel A-B, and that node A is trying to send two transfers through this channel at the same time, now assume the first transfer has a longer path than the second. Assume that messages are sent one at the time, then the second transfer will have to wait for the longer path to complete its protocol before proceeding with the second, increasing the latency. This point generalizes and holds even if messages are batched, the difference is that instead of talking about a message we would have to talk about a group of messages.
and also requires additional processing due to signing and validation of every message, specially problematic if one have interactive signing
This is true for the current implementation, however, signing is being used to authenticate the messages to prevent an attack. The important thing here is that we can use different mechanisms to do the authentication, e.g. end-to-end encryption, with E2E user intervention isn't necessary, as long as the channel is not compromised (e.g. the symmetrical key is safe).
Regarding the motivation
These confirmation messages, so far, are used only to tell sender it can stop retrying a message.
I agree completely with this statements here. Yes, these two messages have a unique purpose. Delivered messages inform the sender that the recipient has received and safely stored the message. Processed messages means that the message was not only stored, but also that the contents of the message which had to be validated against the blockchain were validated and accepted as valid. I not only think that having a single meaning per message is fine, I also think it's something we should really strive for.
Regarding the specification
LockedTransfer initiator => mediator: either a SecretRequest will be received from the target, or a RefundTransfer will be received from the current mediator. LockedTransfer mediator => mediator|target: either a SecretReveal or RefundTransfer will be received from next hop/mediator or target
You are assuming that either a secret request or a refund transfer will always arrive, what if neither arrive? This could happen for a variety of reasons:
- There is no capacity from the mediator-> initiator, so the refund transfer cannot be sent and the lock expires.
- A node downstream, mediator or target, received the locked transfer inside the danger zone and halted the protocol to wait for a remove expired lock. (This can happen because 1. of a poorly chosen initial lock expiration, because 2. a node went off-line for a long period, because 3. of high latency in a node's transport, be it matrix or something else).
SecretRequest target => initiator: a SecretReveal should be received from initiator
This is a good example of why having multiple responsibilities for a single message can be bad, what if the secret is learned by the mediator through the blockchain? Because the node learned it from the blockchain it knows that everyone else in the path learned it from the blockchain, including the previous mediator and the initiator, and for the current version of the code base it does not send a secret reveal backwards, it's only done when the secret is learned off-chain
The proposed change would require us to review all messages and code paths to ensure that they are sent for the unhappy case too, otherwise the node will be unable to clear the message from the queue and make progress.
SecretReveal target|mediator => prev mediator: a Secret/Unlock (see #3132 , #3145) response should be received
This can actually deadlock if messages are sent one at the time and there is a refund transfers. Consider the following path and order messages in the queue:
A-B-C-B-D B->C: SecretReveal, Unlock C->B: SecretReveal, Unlock
Batching could fix the above example, however, we would have to introduce a requirement for the minimum batch size to the transport layer and assume this size ensures progress when the queue is bigger than it.
Thank you, @hackaugusto for your thoughts. But I think some misunderstandings happened, and would like to get your position on the clarifications below:
meaning they can be sent as soon as it is appropriate ( == no waiting/latency)
That's true for UDP, but on matrix you need to consider each event incurrs in ramp-up-down times for /sync
requests on receiver and on sender (sender receives its own messages too, even if they're ignored) which, for a relatively slow method like http long polling, can aggregate to a quite large protocol overhead.
Assume that messages are sent one at the time, then the second transfer will have to wait for the longer path to complete its protocol before proceeding with the second, increasing the latency.
Why? My idea is that the queues won't be halted by a message not yet confirmed, but instead, all messages should be sent as soon as they arrive (and/or batched), and protocol will always move forward regardless of confirmation method. This proposal is about changing the machanics of when we stop retrying some message being sent, but not requiring this to happen at all for the protocol (including different transfers going through the same channel) to go on.
The important thing here is that we can use different mechanisms to do the authentication, e.g. end-to-end encryption
Agree, this point specifically can also be replaced by transport features. Nonetheless, IMO the other points still hold and we can shrink the protocol and needed roundtrips through this proposal.
Yes, these two messages have a unique purpose.
Yes, I know they mean different things, but my proposal is that we can drop this distinction, and implement the actual effect of these messages (in this case, stopping the retry) implicitly through the actual protocol messages, instead of specific ones.
You are assuming that either a secret request or a refund transfer will always arrive, what if neither arrive?
This is built-in. I considered that any of these messages could never arrive, in which case (via this proposal) per protocol they should then be retried until the message is not valid anymore (in this case, lock expiring). If the situation change, the message can eventually be accepted and transfer go through. If not, and transfer never finish, they should continue to be retried until the lock expires or the channel is closed, making them invalid and retry stopped.
This is a good example of why having multiple responsibilities for a single message can be bad, what if the secret is learned by the mediator through the blockchain?
This only affects the condition in which the message (in this case, SecretReveal
) gets cleared (stop retry). If the message is revealed on-chain, we can simply stop retrying the SecretReveal
messages in any of the pending mediators, as the receiver will (presumably) have also detected the on-chain reveal and won't send the Secret/Unlock
, so this is this corner-case condition to stop retrying the message.
Batching could fix the above example, however, we would have to introduce a requirement for the minimum batch size to the transport layer and assume this size ensures progress when the queue is bigger than it.
That's not needed. As I said, the messages can be batched, but batching is just an optimization, they're always sent anyway (preferably in-order, but not necessarily, that's why we rely on the retry mechanism). They won't deadlock, as all messages will always be sent, and eventually should be accepted, or else continue retrying until some higher-tier condition is invalidated (e.g. channel closed).
This proposal makes the delivery/acceptance of the messages async with the protocol. Once the state moves on sender (and then a respective message is queued to partner), sender can move on, making new state changes and queueing other messages on top of it. Eventually, they should be accepted through retry mechanism, and once the side effect of it is detected, we can give up retrying the message. Until then, everything is async, and we can move forward in the protocol.
That's true for UDP, but on matrix you need to consider each event incurrs in ramp-up-down times for /sync requests on receiver and on sender (sender receives its own messages too, even if they're ignored) which, for a relatively slow method like http long polling, can aggregate to a quite large protocol overhead.
This is not specific to the delivered and processes messages, how does this implies these messages incur latency?
Why? My idea is that the queues won't be halted by a message not yet confirmed, but instead, all messages should be sent as soon as they arrive (and/or batched), and protocol will always move forward regardless of confirmation method.
The above assumes that only the nth-first messages of the queue are sent.
[...], but my proposal is that we can drop this distinction, and implement the actual effect of these messages (in this case, stopping the retry) implicitly through the actual protocol messages, instead of specific ones
That was the point, implicit meaning is not a good thing.
This is built-in. I considered that any of these messages could never arrive, in which case (via this proposal) per protocol they should then be retried until the message is not valid anymore (in this case, lock expiring). If the situation change, the message can eventually be accepted and transfer go through. If not, and transfer never finish, they should continue to be retried until the lock expires or the channel is closed, making them invalid and retry stopped.
This is not a sufficient condition, what if the recipient is offline while the lock expires? The locked transfer retries stop and the remove expire lock is added to the queue, but remove expire lock will never be processed without the locked transfer.
This only affects the condition in which the message (in this case, SecretReveal) gets cleared (stop retry). If the message is revealed on-chain, we can simply stop retrying the SecretReveal messages in any of the pending mediators, as the receiver will (presumably) have also detected the on-chain reveal and won't send the Secret/Unlock, so this is this corner-case condition to stop retrying the message
I know that we can try to figure out what are all conditions for retries, that is not the point. The point is that this is extra work, more error prone, and difficult. It may be true that matrix has some persistence, but IMO we should not design the protocol around the property of this specific transport, so the above holds.
That's not needed. As I said, the messages can be batched, but batching is just an optimization, they're always sent anyway (preferably in-order, but not necessarily, that's why we rely on the retry mechanism). They won't deadlock, as all messages will always be sent, and eventually should be accepted, or else continue retrying until some higher-tier condition is invalidated (e.g. channel closed).
Again, I'm assuming the first nth-messages are sent. Your batch is the whole queue.
This proposal makes the delivery/acceptance of the messages async with the protocol. Once the state moves on sender (and then a respective message is queued to partner), sender can move on, making new state changes and queueing other messages on top of it. Eventually, they should be accepted through retry mechanism, and once the side effect of it is detected, we can give up retrying the message. Until then, everything is async, and we can move forward in the protocol.
So, from what I can see the proposal is to:
- reduce number of signatures
- reduce latency
- reduce load
For 1. I think e2e is a better solution. for 2. I'm not convinced it will have a positive impact. For 3 it's probably negligible, delivered and processed messages with a signature are around 77 bytes, without it they are 12 bytes.
This is not specific to the delivered and processes messages, how does this implies these messages incur latency?
incur latency because they're unneeded. any unneeded message incur latency, given matrix internals
The above assumes that only the nth-first messages of the queue are sent.
All messages are always retried, until we're certain they can be cleared. If an out-of-order message is rejected, their previous message will be retried, and eventually accepted. "queue" won't be halted by non-confirmed messages.
That was the point, implicit meaning is not a good thing.
IMHO it is, if it's inherent from the protocol. e.g. LockedTransfer
includes more than one information on it, we don't send one message for each, because they always go together. This is semanthic, and is inherent of the protocol, if we say so.
This is not a sufficient condition, what if the recipient is offline while the lock expires? The locked transfer retries stop and the remove expire lock is added to the queue, but remove expire lock will never be processed without the locked transfer.
You're right, then keep sending them until the Processed
for the LockExpired
comes, because it'll be accepted only after expired LockedTransfer
is. There's always a condition to tell when core may tell transport it may stop retrying some message, without requiring specific messages for it.
I know that we can try to figure out what are all conditions for retries, that is not the point. The point is that this is extra work, more error prone, and difficult.
I think it's not more difficult than having specific messages with specific state changes and conditions handled to do the same, which we need to keep in sync and ensure does what it's expected.
It may be true that matrix has some persistence, but IMO we should not design the protocol around the property of this specific transport, so the above holds
This whole proposal has nothing to do with matrix's persistency, nor requires it, it was dropped in favor of transport-agnostic retry-ordering mechanism.
Again, I'm assuming the first nth-messages are sent. Your batch is the whole queue.
Yes, it is, but is doesn't need to be, messages are retried independently of each other, and will eventually be accepted in order.
So, for 1. e2e may tie us to the transport, I'd rather not. Latency and load are not the main reasons for this proposal, but a side effect of the actual target: reduced number of messages required for the core protocol. IMHO latency and load are just a couple of the number of ways a protocol with just the bare minimum size is better than a fatter one.
In Monday's transport teem meeting we decided on a deadline for discussion on this issue no later than 2019-01-25.
My take away from this discussion:
The overall discussion is whether we should remove the dependency on receiving Processed
when a node sends a balance-proof message because we can depend on other messages being sent as a side effect which eventually tell us that the node received and actually processed/accepted the previous message.
This is somehow related to our pending_transactions
implementation where receiving a specific blockchain event could invalidate / remove a pending transaction because it is no longer required.
From one hand, if i take the example:
LockedTransfer mediator => mediator|target: either a SecretReveal or RefundTransfer will be received from next hop/mediator or target
What could happen is that either the SecretReveal
or RefundTransfer
messages will be received, therefore removing the original LockedTransfer
message from the queue, OR the node received the locked transfer and went offline or went out of capacity, therefore sending the locked transfer again presents no harm as the message will be rejected due to having the same nonce (duplicate message which the receiving node already processed).
On the other hand, the same example of:
LockedTransfer mediator => mediator|target: either a SecretReveal or RefundTransfer will be received from next hop/mediator or target
Could present different questions. Such as: What if the secret was registered on-chain. Since the transport is agnostic about whats happening on the blockchain, this could result in the LockedTransfer to be retried multiple times although the secret became known. The way to stop this message from being retried is to either have the transport also check blockchain events or for the state machine to inform the transport about this which is a bad design (the current architecture implements the complete opposite).
One more example can be given in a topology of channel A-B where A sends a locked transfer to B, B processes that message and goes offline. Later on, A expires the lock, the lock expiry will never be sent because the locked transfer was not removed from the queue because no message was received by A from B saying that this message was processed. Therefore, protocol was halted.
My opinion: i agree with Augusto on the point:
I not only think that having a single meaning per message is fine, I also think it's something we should really strive for.
IMO the Processed
message IS part of the protocol and it's meaning: the node has received and accepted the previous message.
While having (for example) the SecretRequest
message to say: I accepted the previous message and now i am reacting to it by requesting the secret` makes our protocol message have multiple meanings.
To be discussed and decided on.
@rakanalh good points. Yes, your understandings are mostly right, only some clarifications:
The way to stop this message from being retried is to either have the transport also check blockchain events or for the state machine to inform the transport about this which is a bad design (the current architecture implements the complete opposite)
The second case is actually the current implementation, and the intended one: core/state machine knows when a message was received/processed (through handling of the confirmation messages), and push to the transport the information it can give up retrying that message. The only observation is that it does so implictly, by removing the message from the queue, which is monitored by transport to give up (#3252 contains a proposal to make that explicit as well).
This case you pointed out is an edge case of the one raised by Augusto, in which the to-be expected side effects of the LockedTransfer doesn't happen, but the LockExpired which would be the next one to stop retry of the LockedTransfer isn't sent either (because the secret was revealed, so the lock didn't actually expire). I agree this is an issue, and didn't have time yet to think on another side-effect that could be used to confirm the LockedTransfer in this specific case. Worst case we could always keep specific or generic (Processed) confirmation messages for these very specific case, and that would already IMHO improve protocol by reducing required messages.
Later on, A expires the lock, the lock expiry will never be sent because the locked transfer was not removed from the queue
Not true, in current implementation and also on this proposal, messages continue to be retried until they're cleared. LockExpired doesn't depend on LockedTransfer to be cleared from queue to be sent. All messages are retried, and ordered both on sender and (as per this proposal) on receiver as well. LockedTransfer will be eventually accepted, and LockExpired as well, which Processed will confirm both were accepted and clear them on sender. A retrying message doens't lock the queue, they're always retried in order and eventually accepted and cleared.
IMO the Processed message IS part of the protocol and it's meaning: the node has received and accepted the previous message.
You're right, currently they're part of the protocol. This proposal is about arguing they aren't strictly needed for it, somehow superfluous, and could be (mostly) taken out of it (at least as a hard requirement of the protocol, could be kept for compatibility for some time). Again, protocol is what we say it is.
Granularity of messages is subjective, about what we consider different meanings for a given message, depends a lot on perspective. If looking closer enough, most messages has multiple meanings, and if zooming out, we could merge some of them to reduce protocol size.
If this "side effect instead of Processed" isn't accepted, a compromise would be to use this approach at least for getting rid of Delivered
messages, which already compose 50% of sent messages: Delivered
is completely superfluous for all messages that require Processed
, and the cases where it's used for confirmation are simpler than the cases for LockedTransfer raised by Augusto and Rakan.
I'm working on a small PoC for the Delivered
part I want to address for this limited scope of this issue, to guide me on commenting and better describing this next iteration of this proposal. Will edit the PR as soon as I get a better overview of it, hope this week
On the workshop, it was decided on the following approach for now:
- [ ] Remove requirement for signature in the
Delivered
message; the signature can be kept being done by the nodes for now, for backward compatibility, but theysender
will be set by matrix transport from thedisplayname
transport recovering instead of internal signature, soDelivered
messages without signature (like we're going to send from light clients) are still accepted protocol-wise.Delivered
messages are used only for control, and have minimum safety requirements, transport-side sender recovering is enough - [ ] Remove requirement for signature in the
SecretReveal
message; the validity of the secret is enough confirmation for it, it doesn't make sense to also require signature; it can still be signed in full nodes (as currently) for backwards compatibility as well.
Issue was edited to include current proposal, after talks with @hackaugusto . This shouldn't take too long, but could break compatibility (it can be kept with a little more work though, although we could skip keeping it for now and just asking clients to upgrade).
Just to be clear, my opinion is written below and it includes only the things written bellow, nothing else:
- Sending a delivered in response to a processed is a bug
- Sending a delivered and a processed for the same message is a bug
And the bugs above should be fixed. We have a few ways to fix this:
- Upgrade the software and just fix the above issues. This however, requires a client upgrade, because the current version expects a delivered in response to a processed, otherwise the processed message is not cleared from the queue, making it stuck.
- Add a protocol upgrade, and in the new version fix the above, this would maintain backwards compatibility
- Once there is a new deploy of the smart contracts, do the fixes, because the networks will be disjoint anyways.
Edit: Important note: The processed message is being added to the global queue, meaning that it does not hold off balance proof messages.
Documenting a discussion that @rakanalh and I had in the office:
- The
Processed
message is required for theLockedTransfer
andRefundTransfer
messages, otherwise for a failed transfer every node in a route will keep retrying the sameLockedTransfer
until the lock expires, this is an amplification attack vector. - The
Processed
message is needed for theRemoveExpired
lock message because there is nothing else to confirm it. - The
WithdrawRequest
message must be confirmed by theWithdraw
response and notProcessed
norDelivered
, this remove source for bugs, where the request is confirmed but theWithdraw
message is never sent. - The
SecretRequest
message must be confirmed by theSecretReveal
message, for the same reason as theWithdrawRequest
.
The SecretReveal
is the most annoying edge case. If we rely on the Processed
alone we get into the same scenario described above for WithdrawRequest
and SecretRequest
. If we try to rely on something else we have to watch for multiple sources of state, the Unlock
message, the lock's expiration block, and the ContractReceiveSecretReveal
Extending the above:
- The messages
Withdraw
andSecretReveal
don't need to be confirmed with aProcessed
nor aDelivered
. When Alice sends to Bob aSecretRequest
message, she does not know if her message was lost in transit, if Bob is offline, or if Bob's response got lost. In any case, Alice has to keep retrying theSecretRequest
while it makes sense (the lock has not expired). Because of this, Bob can safely assume that Alice will resend theSecretRequest
if she does not receive theSecretReveal
message, IOW Bob does not need to do a retry. - The
Delivered
message is not useful anymore and should be removed. It was added as an optimization that was never employed and introduced bugs in the code.
Note: The Processed messages are still necessary, as described here and here.
@rakanalh @hackaugusto what is the outcome of this issue? Can we close it? Can we provide what was requested?
This is to be implemented in our next milestone.
This was done in the Light Client, and in a backwards compatible manner. Here's how we got it working
- LockedTransfer, Unlock, LockExpired, RefundTransfer: BP-messages are retryed until receiving the respective Processed
- Processed for a received BP-message is cached, and sent one-shot whenever the same message is received again
- SecretRequest: retried until secret is known (from off or on-chain reveal) or expires
- SecretReveal to target: sent once for each valid received SecretRequest
- SecretReveal to previous hop: sent when secret is known, retried until receiving Unlock or secret gets registered on-chain
- WithdrawRequest: retried until we receive WithdrawConfirmation or request expires
- WithdrawConfirmation: cached and sent once per valid (non-expired) request
- WithdrawExpiration: we don't care for it, nodes can expire requests on their own, but we may send Delivered to confirm it for backwards compatibility
Delivered messages are superfluous on this setup, but still sent for Processed, SecretRequest, SecretReveal iff the peer doesn't advertise noDelivery
capability (for backwards compatibility), but can be safely removed.
#5581 partially resolves the issue. What it's doing can be read there.
- [ ] fix #5581 to be merged
- [ ] implement the confirmation for partner’s old messages (messages with a nonce <= the latest processed nonce)
- [ ] discuss Andre's solution for backward compatibility
- [ ] Do we want to break backward compatibility (potentially last chance before the RC)
Breaking backward compatibility would make the codebase easier. After the RC and release we should have to stay with the protocol.
have a call with all stakeholders to decide the best approach. @hackaugusto @ulope @andrevmatos @fredo @Dominik1999 have a call and decide if this is a blocker forour release and how to go on
https://github.com/raiden-network/raiden/issues/3283#issuecomment-601270304 I describe how we implemented Capabilities. If you guys decide to implement it, you can worry less about breaking compatibility as protocol differences can be negotiated through them, as long as thenode can fallback to previous behavior. If you don't, as I said, our behavior is still backwards compatible with your current, so we will negotiate optimizations with nodes that support it, and fallback if it doesn't.
Results from the call:
- Just removing the delivered message for the current sprint is sufficient. The light client does not rely on the delivered so the removal of the message will not break compatibility among the clients. (Note: The current capability in the light client would have to change name and semantics, since the python client will not have delivered messages anymore, but it will still wait for Processed messages @andrevmatos )
- Raiden should send a
Processed
in response to received messages if thenonce
of the message is lower than or equal to the partner'snonce
in the local state (this is necessary for synchronization in case processed messages are dropped) - The withdraw messages are a bit "special" since these are implementing the approach that doesn't always send a processed back. (Note for the implementer, it looks like
inplace_delete_message
is missing the cleanup forSendWithdrawRequest
when aReceiveWithdrawExpired
is received, this needs a test and a fix)
@hackaugusto We do send Processed
and rely on it, for EnvelopeMessages, so the behavior change you describe should be fully compatible with our current approach and expectations.
The current capability on the LightClient is just a way to tell partners what is already true: that we don't need Delivered messages (the behavior you described), and therefore they don't need to be sent to us, saving the wasted roundtrips. Thanks
So https://github.com/raiden-network/raiden/pull/5581 will be part of Alderaan, whereas further optimization described in this issue are not part of it.
Ok, we're revisiting what was left for this on the LC for https://github.com/raiden-network/light-client/issues/1514 and https://github.com/raiden-network/light-client/issues/249 .
From what we see, WithdrawConfirmation
confirms WithdrawRequest
, therefore neither need Delivered/Processed, but WithdrawExpired
has no confirmation and to this day still requires Processed
to confirm it.
Our suggestion is to get rid of WithdrawExpired
as per #4721 , and this should conclude this issue. We're still implementing replying the required Processed for now, but won't require it to expire a past WithdrawRequest
on our side, therefore making it a non-breaking change/protocol optimization, and possibly introduce a capability to tell other [light] clients they don't need to bother sending the WithdrawExpired
when withdrawing from compatible partners.
Basic capability publishing support as been implemented via #6482. Now this needs to be done so wen can take advantage.
General approach
- [ ] Remove
Delivered
- [ ] All protocol message that don't elicit an immediate protocol response message require a
Processed
to clear the re-sending. Example:LockedTransfer
does not have an immediate response, because the answer (RevealSecret
/LockExpired
) depends on the outcome of the protocol (success/failure) along the complete path of the payment.
ToDo
- [ ] compare the approach in PR#5581 to what we want to do now
- [ ] currently message clearing is triggered by state changes - I wonder if that should change to "
ClearMessage
" event, because the clearing now may depend on a state transition? - [ ] consider moving
inplace_delete_message
to the transport -- the current implementation of the_RetryQueue
is rather opaque, when it comes to clearing items from the queue. - [ ] Check: > Note for the implementer, it looks like
inplace_delete_message
is missing the cleanup forSendWithdrawRequest
when aReceiveWithdrawExpired
is received, this needs a test and a fix (context). - [ ] Remove
Delivered
message, it should not be needed anymore - [ ] Add edge case tests for Withdraw
- [ ] Change advertised default capabilities (depends on #6503 + resolution)
- [ ] Decide on fallback behavior for legacy peers. I.e. how to treat peers that don't advertise
noDelivery
? If we implement legacy behavior, add integration test.
- when re-sending Processed, we have to memorize/recall past message ids
- option: use a (modified) cachetools.TTL cache for non-expired locks, or their messages. scenario: partner node crashes before receiving the terminating
Processed
message and retries. Since max number of in-flight transfers is bound and we have a timeout, this is safe - compare LC implementation