openthread icon indicating copy to clipboard operation
openthread copied to clipboard

SRP server fails to update existing service from SRP client which got new IP addresses

Open AlanLCollins opened this issue 2 years ago • 5 comments

Describe the bug SRP server fails to update existing service from SRP client which got new IP addresses

To Reproduce github/openthread (commit '686c0de5fa79d5afe5b3128fabdb61e05610607d') Dec-14 github/ot-br-posix (commit '72fa16e595af28d2f2a3fbd19c3da9f5c3d397de') Dec-15 Use avahi for DNS

  • Leader border router creates a network with credentials "ABC"
  • Thread end node joins the "ABC" network, gets IP "XYZ" and registers service "NNN_MMM"
  • Leader border router does factory reset (e.g. ot factory reset) and goes offline for ~3min, for the Thread ED to realize the leader/parent is gone.
  • Turn on Leader BR, and create the same "ABC" network.
  • Thread node will re-join and get new IP addr "OPQ". Then it will attempt to update serice "NNN_MMM".
  • BR will respond service got updated, but the it will continue having the old IP address "XYZ" instead of "OPQ".

Expected behavior I would expect IP address to get updated. Even that this is a rare corner case that may not be applicable because a real factory reset should also wipe out the SRP proxy / avahi service table. However, this example exposes a potential bug in the SRP server failing to update the important fields of a service.

Console/log output

Additional context I found this, by using Matter (CHIP) application framework. (version TE8_rc3) Amazon echo device as Matter controller (chip-tool alike). nRF-DK board as matter light-bulb

AlanLCollins avatar Apr 27 '22 16:04 AlanLCollins

@AlanLCollins could you share the BR and end device logs?

wgtdkp avatar Apr 28 '22 00:04 wgtdkp

Please refer to the attached files. 04-28_SRPupdateFail_nRF-serial.txt 02-28_SRPUpdateFail_Leader-log.txt

The (matter light-bulb) nRF board is getting on/off cluster toggle commands: [2022-04-28 08:20:35] D: 410657 [DMG]Received command for Endpoint=1 Cluster=0x0000_0006 Command=0x0000_0002 at the moment, the light-bulb has [fdc1:8aa9:f419:2d05:ad5b:5510:ae8d:50b0]:5540 [2022-04-28 08:22:05] fdc1:8aa9:f419:2d05:ad5b:5510:ae8d:50b0 [2022-04-28 08:22:05] fdba:be11:2233:0:0:ff:fe00:a402 [2022-04-28 08:22:05] fdba:be11:2233:0:c3ea:57df:ecdb:6c75 [2022-04-28 08:22:05] fe80:0:0:0:80f6:50bf:9722:4c82

which is the IP advertised by the SRP proxy: image Then, I factory reset openthread , and wait until matter light drops the network: [2022-04-28 08:25:37] D: 711900 [DL] Device Role: DETACHED Then, I create the same network, and the light-bulb gets new IP address: [2022-04-28 08:26:08] fd01:6e1d:b019:f25f:c9b:ecfd:f864:c5e8 [2022-04-28 08:26:08] fdba:be11:2233:0:0:ff:fe00:2401 [2022-04-28 08:26:08] fdba:be11:2233:0:c3ea:57df:ecdb:6c75 [2022-04-28 08:26:08] fe80:0:0:0:80f6:50bf:9722:4c82 but the SRP keeps the record with the old IP address, which makes new toggle fail: 04-28 15:26:32.114 2283 2286 E CHIP EM : Failed to Send CHIP MessageCounter:3363138151 on exchange 16741i sendCount: 3 max retries: 3

AlanLCollins avatar Apr 28 '22 15:04 AlanLCollins

@AlanLCollins , I can suggest couple of things to investigate further:

  • First may be to check whether the matter code-base on device is behaving correctly and is updating the host address upon device reattaching and getting a new set of addresses.
  • The OT SRP client expects the user of the SRP Client API (e.g. matter code base) to set the host name and also select and set the address (or addresses) it wants to be registered (along with other info about the service(s) to register).
  • OT stack's SRP client will re-register after a new address is set (otSrpClientSetHostAddresses() is called).
    • These behaviors are covered by a set of tests in OT and I did quickly check this using CLI and it worked as expected.
  • My guess/theory is that perhaps the matter code base somehow not set/update the new address on SRP client and SRP client is therefore registering the previous host name and address that was set on it (when it discovers the SRP server again).
  • You can check the OT APIs related to this here: https://github.com/openthread/openthread/blob/9a2d84a4b78413e20f269df96a855e573e04a616/include/openthread/srp_client.h#L403-L427

The other thing I can suggest is to check the service/address info on the BR SRP server after the reboot and device re-registering:

  • If you have access to OT CLI on BR you can use the cli commands: https://github.com/openthread/openthread/blob/main/src/cli/README_SRP_SERVER.md
  • The goal of this to see if it is correct on SRP server, and maybe the old service entry is somehow being cached somewhere (and therefore old address is still seen).

abtink avatar Apr 28 '22 18:04 abtink

@abtink , thank you for the guidance. I confirmed that SRP server updates correctly the IP address (srp server service command). However, shouldn't the SRP server notify the SRP proxy to update the mDNS daemon?. in my case, I am using Avahi. CHIP-SDK resolves services direclty to the avahi-daemon, which is the one that keeps the old record.

AlanLCollins avatar Apr 29 '22 23:04 AlanLCollins

@AlanLCollins , when you say you are factory resetting OpenThread, are you simply invoking the factoryreset OT CLI command? If so, I wonder if this is related to https://github.com/openthread/ot-br-posix/pull/1339. Is it possible for you to include this fix and see if it helps address your issue?

jwhui avatar May 04 '22 03:05 jwhui

Closing stale issue.

jwhui avatar Oct 20 '22 02:10 jwhui

My apologies. I failed to respond to this ticket. Back then, I tested https://github.com/openthread/ot-br-posix/pull/1339 , and newest OT versions from Q3/Q4 2022. That fixed the issue. Thank you !

AlanLCollins avatar Feb 06 '23 18:02 AlanLCollins

@AlanLCollins , great to hear. Thanks for reporting back!

jwhui avatar Feb 06 '23 19:02 jwhui