connectedhomeip
connectedhomeip copied to clipboard
`OnFirstMessageDeliveryFailed` no longer performs address lookup after 21256
Problem
After https://github.com/project-chip/connectedhomeip/pull/21256 OnFirstMessageDeliveryFailed
no longer performs address lookup on established sessions. This is because previously OperationalDeviceProxy
would stick throughout the lifetime of the establish connection and it would receive the event and perform the address lookup. With OperationalSessionSetup
now being ephemeral and only living for the lifetime of the session setup, it is no longer is around when OnFirstMessageDeliveryFailed
event is dispatched.
Proposed Solution
Here are 3 potential solutions:
-
OnFirstMessageDeliveryFailed
is removed fromSessionDelegate
.ReliableMessageMgr
now notifies theSessionManager
about first message delivery failure- Pros:
- Should be one of the easier solutions to implement.
- Cons:
- (small) Arguably
SessionManager
should only manage active connection, having it perform an address lookup is somewhat outside the scope of it's role and is starting to creep - (small) The Address lookup code is still needed in
OperationalSessionSetup
so need to figure out a clean spot for this code to live in one spot functionally be used by bothOperationalSessionSetup
andSessionManager
.
- (small) Arguably
- Pros:
-
Create a completely new
SessionDelegate
derived object that will perform address lookup- Pros:
- Likely the cleanest implementation
- Cons:
- Currently I personally don't see where this fits nicely
- Likely will need to allocate this object for every established session, which might not be desirable.
- Pros:
-
Remove
OnFirstMessageDeliveryFailed
entirely. The spec only mentions that we should do this, so it is not a requirement.- For this solution:
- Some argue that the chances address lookup will actually do anything that allows a session to recover is very low. Most DHCP renewals end up with the device getting the same address.
- Against this Solution:
- Some argue that established sessions are open on a magnitude of days, while the chance is low, this is a convenience feature that maybe help in some particular environments.
- For this solution:
Solution 4 (which I still have not looked into so I don't have pros/cons) UpdateDevice
could be made to do the lookup
Just to be clear, a spec SHOULD means "it's expected that any good implementation will do this; not doing it must be grounded in very good reasons".
So yes, it's technically spec-compliant to not do it, but not doing it is also second-guessing the intent of the spec and needs to be very firmly grounded. I don't think we have such firm grounding here, and should be implementing this, esp. since it is working right now and we are breaking it.
In terms of where to do it... We could add some way to run the OperationalSessionSetup
state machine partway against an existing session, which is how this used to be done (as in, create a new OperationalSessionSetup with the existing session, then run it through only the first few states). Or we could have a small class for just doing this; that does involve a separate pool and whatnot.....
My vote would be for 1. I think it has the least number of changes.
At this point I am just documenting things at they come. I haven't decided on which approach yet.
The issue with reusing OperationalSessionSetup
seems to be that ReliableMessageMgr
doesn't have access to CASESessionManager
and doing so will create a circular build dependency. Arguably we could do a delegate or callback type thing. Based on how I am looking at it, that plumbing would take a little bit of effort.
Originally when I first wrote solution 1, I had the idea that the lookup would happen entirely within SessionManager
, but the issue with that is we would only be able to do 1 look up at a time, which makes that a show stopper as I think a requirement would be that we could handle more than 1 parallel look up at a time.
Right now I am investigating if the ExchangeContext
which already has it's own SessionHolderWithDelegate
could handle the address look up itself. On the off chance someone else knows why that cannot happen feel free to chime in
The only things I can think of there are:
- There might still be circular dep issues.
- It would likely involve making ExchangeContext bigger, right?
There might still be circular dep issues.
I don't think there is (but I will double check)
It would likely involve making ExchangeContext bigger, right?
Yes, there might also be a couple other drawbacks listed below
After spending most of yesterday looking into the possible solutions here are the two that I think are the most feasible. I switched to letters as to not confuse them with the numbered solutions mentioned earlier in this issue.
Solution A:
-
OnFirstMessageDeliveryFailed
is removed fromSessionDelegate
and becomes it's own delegate. For now lets call itMessageDelieveryFailedDelegate
which will have just the one function that will need to be overriddenOnFirstMessageDeliveryFailed
. -
CASESessionManager
will implementMessageDelieveryFailedDelegate
- During
CASESessionManager::Init
it will register it'sMessageDelieveryFailedDelegate
withReliableMessageMgr
. We have access toReliableMessageMgr
, and from what I see, initialization ofCASESessionManager
happens after initialization of the things we need to getReliableMessageMgr
from already providedCASESessionManagerConfig
argument. - Pros:
- We are able to reuse most of
OperationalSessionSetup
to do the lookup for us.
- We are able to reuse most of
- Cons:
- At this point we will not have the exact
Session
that requires a lookup. Arguably updating all activeSession
s that match the ScopedNodeID should be updated anyways so this might actually be beneficial, but listed as a con right now since the behaviour is different than what we used to have. - Extra care will need to be taken in case we Allocate a
OperationalSessionSetup
specifically for the lookup in the corner case where, some other entity doesFindOrEstablishSession
during the lookup.- It's not that it cannot be done, just something to be aware of, and I think I have a solution for this already involving adding a state to the
OperationalSessionSetup
state machine (but there might be something else that comes to me as I work on it).
- It's not that it cannot be done, just something to be aware of, and I think I have a solution for this already involving adding a state to the
- At this point we will not have the exact
Solution B:
-
ExchangeContext
which already has it's ownSessionHolderWithDelegate
and can just implementOnFirstMessageDeliveryFailed
- Pros:
- Overall logic should be more simple then Solution A since Solution A likely involves changing the
OperationalSessionSetup
state machine (IMO changes to state machine does make it harder for someone unfamiliar to the code to take a look at it for the first time to understand what is going on).
- Overall logic should be more simple then Solution A since Solution A likely involves changing the
- Cons:
- It is possible there are multiple
ExchangeContext
that will hitOnFirstMessageDeliveryFailed
for the sameSession
, creating multiple DNS lookups.- This can be mitigated against by also adding something to
Session
to identify if there is an existing lookup already happening.
- This can be mitigated against by also adding something to
-
ExchangeContext
will need to addAddressResolve::NodeLookupHandle
- The DNS lookup time to complete might be longer then the life of the
ExchangeContext
since that is controlled by the application.
- It is possible there are multiple
I think it is likely apparent in my write-up I have biased towards Solution A, but wanted to post this in case anyone disagrees or has something that I overlooked. Both of these solutions allow for parallel lookup on different Session
s