openthread
openthread copied to clipboard
SRP server fails to update existing service from SRP client which got new IP addresses
Describe the bug SRP server fails to update existing service from SRP client which got new IP addresses
To Reproduce github/openthread (commit '686c0de5fa79d5afe5b3128fabdb61e05610607d') Dec-14 github/ot-br-posix (commit '72fa16e595af28d2f2a3fbd19c3da9f5c3d397de') Dec-15 Use avahi for DNS
- Leader border router creates a network with credentials "ABC"
- Thread end node joins the "ABC" network, gets IP "XYZ" and registers service "NNN_MMM"
- Leader border router does factory reset (e.g. ot factory reset) and goes offline for ~3min, for the Thread ED to realize the leader/parent is gone.
- Turn on Leader BR, and create the same "ABC" network.
- Thread node will re-join and get new IP addr "OPQ". Then it will attempt to update serice "NNN_MMM".
- BR will respond service got updated, but the it will continue having the old IP address "XYZ" instead of "OPQ".
Expected behavior I would expect IP address to get updated. Even that this is a rare corner case that may not be applicable because a real factory reset should also wipe out the SRP proxy / avahi service table. However, this example exposes a potential bug in the SRP server failing to update the important fields of a service.
Console/log output
Additional context I found this, by using Matter (CHIP) application framework. (version TE8_rc3) Amazon echo device as Matter controller (chip-tool alike). nRF-DK board as matter light-bulb
@AlanLCollins could you share the BR and end device logs?
Please refer to the attached files. 04-28_SRPupdateFail_nRF-serial.txt 02-28_SRPUpdateFail_Leader-log.txt
The (matter light-bulb) nRF board is getting on/off cluster toggle commands:
[2022-04-28 08:20:35] D: 410657 [DMG]Received command for Endpoint=1 Cluster=0x0000_0006 Command=0x0000_0002
at the moment, the light-bulb has [fdc1:8aa9:f419:2d05:ad5b:5510:ae8d:50b0]:5540
[2022-04-28 08:22:05] fdc1:8aa9:f419:2d05:ad5b:5510:ae8d:50b0
[2022-04-28 08:22:05] fdba:be11:2233:0:0:ff:fe00:a402
[2022-04-28 08:22:05] fdba:be11:2233:0:c3ea:57df:ecdb:6c75
[2022-04-28 08:22:05] fe80:0:0:0:80f6:50bf:9722:4c82
which is the IP advertised by the SRP proxy:
Then, I factory reset openthread , and wait until matter light drops the network:
[2022-04-28 08:25:37] D: 711900 [DL] Device Role: DETACHED
Then, I create the same network, and the light-bulb gets new IP address:
[2022-04-28 08:26:08] fd01:6e1d:b019:f25f:c9b:ecfd:f864:c5e8
[2022-04-28 08:26:08] fdba:be11:2233:0:0:ff:fe00:2401
[2022-04-28 08:26:08] fdba:be11:2233:0:c3ea:57df:ecdb:6c75
[2022-04-28 08:26:08] fe80:0:0:0:80f6:50bf:9722:4c82
but the SRP keeps the record with the old IP address, which makes new toggle fail:
04-28 15:26:32.114 2283 2286 E CHIP EM : Failed to Send CHIP MessageCounter:3363138151 on exchange 16741i sendCount: 3 max retries: 3
@AlanLCollins , I can suggest couple of things to investigate further:
- First may be to check whether the matter code-base on device is behaving correctly and is updating the host address upon device reattaching and getting a new set of addresses.
- The OT SRP client expects the user of the SRP Client API (e.g. matter code base) to set the host name and also select and set the address (or addresses) it wants to be registered (along with other info about the service(s) to register).
- OT stack's SRP client will re-register after a new address is set (
otSrpClientSetHostAddresses()
is called).- These behaviors are covered by a set of tests in OT and I did quickly check this using CLI and it worked as expected.
- My guess/theory is that perhaps the matter code base somehow not set/update the new address on SRP client and SRP client is therefore registering the previous host name and address that was set on it (when it discovers the SRP server again).
- You can check the OT APIs related to this here: https://github.com/openthread/openthread/blob/9a2d84a4b78413e20f269df96a855e573e04a616/include/openthread/srp_client.h#L403-L427
The other thing I can suggest is to check the service/address info on the BR SRP server after the reboot and device re-registering:
- If you have access to OT CLI on BR you can use the cli commands: https://github.com/openthread/openthread/blob/main/src/cli/README_SRP_SERVER.md
- The goal of this to see if it is correct on SRP server, and maybe the old service entry is somehow being cached somewhere (and therefore old address is still seen).
@abtink , thank you for the guidance. I confirmed that SRP server updates correctly the IP address (srp server service
command). However, shouldn't the SRP server notify the SRP proxy to update the mDNS daemon?. in my case, I am using Avahi. CHIP-SDK resolves services direclty to the avahi-daemon, which is the one that keeps the old record.
@AlanLCollins , when you say you are factory resetting OpenThread, are you simply invoking the factoryreset
OT CLI command? If so, I wonder if this is related to https://github.com/openthread/ot-br-posix/pull/1339. Is it possible for you to include this fix and see if it helps address your issue?
Closing stale issue.
My apologies. I failed to respond to this ticket. Back then, I tested https://github.com/openthread/ot-br-posix/pull/1339 , and newest OT versions from Q3/Q4 2022. That fixed the issue. Thank you !
@AlanLCollins , great to hear. Thanks for reporting back!