connectedhomeip
connectedhomeip copied to clipboard
Avahi mDNS broken
Problem
I observe that commissioning with the Avahi-based mDNS is not possible, both on the TE9 commit as well as on current master (a777a801c6f0637ee54f83a63d02f662338e8b51).
This is observed on our DUT, but can be reproduced with the lighting-app on Linux over Ethernet.
For current master, I am using
# chip-tool pairing manualcode 1 749701123365521327694
For TE9:
# chip-tool pairing ethernet 1 20202021 3840 <address> 5540
chip-tool output:
[1652868848.569495][5730:5731] CHIP:DMG: Received Command Response Data, Endpoint=0 Cluster=0x0000_003E Command=0x0000_0008
[1652868848.569528][5730:5731] CHIP:CTL: Device returned status 0 on receiving the NOC
[1652868848.569538][5730:5731] CHIP:CTL: Operational credentials provisioned on device 0x7f73c8001970
[1652868848.569555][5730:5731] CHIP:TOO: Secure Pairing Success
[1652868848.569567][5730:5731] CHIP:CTL: Successfully finished commissioning step 'SendNOC'
[1652868848.569583][5730:5731] CHIP:CTL: Commissioning stage next step: 'SendNOC' -> 'FindOperational'
[1652868848.569601][5730:5731] CHIP:CTL: Performing next commissioning step 'FindOperational'
[1652868848.569617][5730:5731] CHIP:CSM: FindOrEstablishSession: PeerId = 2513E4427B7AFD23:0000000000000001
[1652868848.569627][5730:5731] CHIP:CSM: FindOrEstablishSession: No existing OperationalDeviceProxy instance found
[1652868848.569811][5730:5731] CHIP:DIS: Operational node lookup already in progress. Will NOT start a new one.
...
[1652868863.569483][5730:5731] CHIP:DIS: Operational discovery failed for 0x0000000000000001: ../../examples/chip-tool/third_party/connectedhomeip/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout
[1652868863.569507][5730:5731] CHIP:CTL: Device connection failed. Error ../../examples/chip-tool/third_party/connectedhomeip/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout
...
lighting-app output:
[1652868848.519004][387072:387072] CHIP:EVL: LogEvent event number: 0x0000000000000002 priority: 1, endpoint id: 0x0 cluster id: 0x0000_001F event id: 0x0 Sys timestamp: 0x0000000001143F44
[1652868848.519010][387072:387072] CHIP:ZCL: OpCreds: ACL entry created for Fabric 1 CASE Admin NodeId 0x000000000001B669
[1652868848.519147][387072:387072] CHIP:DL: Using wifi MAC for hostname
[1652868848.519153][387072:387072] CHIP:DIS: Advertise operational node 2513E4427B7AFD23-0000000000000001
[1652868848.519310][387072:387072] CHIP:DL: Cannot set hostname on this system, continue anyway...
[1652868848.519323][387072:387072] CHIP:DL: PublishService 2513E4427B7AFD23-0000000000000001._matter._tcp5540.-1
[1652868848.519502][387072:387072] CHIP:DIS: mDNS service published error: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:482: CHIP Error 0x000000AC: Internal error
Configured via:
gn gen out/host-avahi --args="target_os=\"linux\"
chip_inet_config_enable_ipv4=false
chip_enable_openthread=false
chip_enable_ota_requestor=false
chip_config_network_layer_ble=false
chip_mdns=\"platform\"
chip_enable_ble=false
chip_enable_wifi=false
treat_warnings_as_errors=false"
Diff against TE9/master branch in order to allow building:
diff --git a/src/platform/Linux/BUILD.gn b/src/platform/Linux/BUILD.gn
index 1095bcebb..64f7eb8cf 100644
--- a/src/platform/Linux/BUILD.gn
+++ b/src/platform/Linux/BUILD.gn
@@ -65,8 +65,6 @@ static_library("Linux") {
"Logging.cpp",
"NetworkCommissioningDriver.h",
"NetworkCommissioningEthernetDriver.cpp",
- "NetworkCommissioningThreadDriver.cpp",
- "NetworkCommissioningWiFiDriver.cpp",
"PlatformManagerImpl.cpp",
"PlatformManagerImpl.h",
"PosixConfig.cpp",
PublishService 2513E4427B7AFD23-0000000000000001._matter._tcp5540.-1
I am noting that interface is -1 here, i.e. AVAHI_IF_UNSPEC - don't think this is the problem though.
It seems only the first service can be successfully published:
[1653054944.716835][1482923:1482923] CHIP:DIS: Updating services using commissioning mode 0
-- ChipDnssdRemoveServices()
[1653054944.717095][1482923:1482923] CHIP:DL: Using wifi MAC for hostname
[1653054944.717103][1482923:1482923] CHIP:DIS: Advertise operational node 7C53B9AC1475C5D1-0000000000000001
[1653054944.717234][1482923:1482923] CHIP:DL: Cannot set hostname on this system, continue anyway...
[1653054944.717256][1482923:1482923] CHIP:DL: PublishService 7C53B9AC1475C5D1-0000000000000001._matter._tcp5540.-1
--- avahi_entry_group_add_service_strlst: OK (0)
[1653054944.717797][1482923:1482923] CHIP:DIS: mDNS service published: _matter._tcp
[1653054944.717945][1482923:1482923] CHIP:DL: Using wifi MAC for hostname
[1653054944.717952][1482923:1482923] CHIP:DIS: Advertise operational node 73BD5B13A3CDD446-0000000000000001
[1653054944.718070][1482923:1482923] CHIP:DL: Cannot set hostname on this system, continue anyway...
[1653054944.718080][1482923:1482923] CHIP:DL: PublishService 73BD5B13A3CDD446-0000000000000001._matter._tcp5540.-1
--- avahi_entry_group_add_service_strlst: Bad state (-2)
[1653054944.718275][1482923:1482923] CHIP:DIS: mDNS service published error: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:485: CHIP Error 0x000000AC: Internal error
[1653054944.718280][1482923:1482923] CHIP:DIS: Failed to advertise operational node: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:485: CHIP Error 0x000000AC: Internal error
[1653054944.718412][1482923:1482923] CHIP:DL: Using wifi MAC for hostname
[1653054944.718430][1482923:1482923] CHIP:DIS: Advertise commission parameter vendorID=65521 productID=32769 discriminator=3840/15
[1653054944.718526][1482923:1482923] CHIP:DL: Cannot set hostname on this system, continue anyway...
[1653054944.718534][1482923:1482923] CHIP:DL: PublishService 67B66F5129736161._matterc._udp5540.-1
--- avahi_entry_group_add_service_strlst: Bad state (-2)
[1653054944.718718][1482923:1482923] CHIP:DIS: mDNS service published error: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:485: CHIP Error 0x000000AC: Internal error
[1653054944.718725][1482923:1482923] CHIP:DIS: Failed to advertise extended commissionable node: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:485: CHIP Error 0x000000AC: Internal error
Debug logs from avahi-daemon:
message repeated 3 times: [ sendmsg() to ff02::fb failed: Network is unreachable]
dbus-protocol.c: interface=org.freedesktop.Avahi.Server, path=/, member=GetAPIVersion
dbus-protocol.c: interface=org.freedesktop.Avahi.Server, path=/, member=GetState
dbus-protocol.c: interface=org.freedesktop.Avahi.Server, path=/, member=EntryGroupNew
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=GetState
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=Reset
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddServiceSubtype
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=Commit
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-util.c: Responding error 'Bad state' (-2)
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-util.c: Responding error 'Bad state' (-2)
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=Reset
sendmsg() to ff02::fb failed: Network is unreachable
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddServiceSubtype
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=Commit
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-util.c: Responding error 'Bad state' (-2)
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=AddService
dbus-util.c: Responding error 'Bad state' (-2)
dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client1/EntryGroup1, member=Free
dbus-protocol.c: client :1.1360 vanished.
sendmsg() to ff02::fb failed: Network is unreachable
I assume the API is used incorrectly here. AddService invoked on the same group after Commit.
@andy31415
is chip-tool running avahi or all-clusters or both?
@andy31415 I am testing with the lighting-app with Avahi.
chip-tools uses minimal mDNS, as far as I can say. But I can reproduce the issue by starting the lighting-app without even invoking chip-too.
I have seen same issue. Is there any update about this issue?
I am reading through the issue, am still unclear how to reproduce. I see the comment was "I can reproduce the issue by starting the lighting-app without even invoking chip-tool".
So I start lighting-app ... and then what? observe mdns service publish error?
On master ToT I tried to compile light examples with the given command, output seems ok:
[1657553787.296122][2935751:2935751] CHIP:DL: Manufacturing Date: (not set)
[1657553787.296124][2935751:2935751] CHIP:DL: Device Type: 257 (0x101)
[1657553787.296126][2935751:2935751] CHIP:SVR: SetupQRCode: [MT:-24J029Q00KA0648G00]
[1657553787.296129][2935751:2935751] CHIP:SVR: Copy/paste the below URL in a browser to see the QR Code:
[1657553787.296131][2935751:2935751] CHIP:SVR: https://project-chip.github.io/connectedhomeip/qrcode.html?data=MT%3A-24J029Q00KA0648G00
[1657553787.296136][2935751:2935751] CHIP:SVR: Manual pairing code: [749701123365521327694]
[1657553787.296146][2935751:2935751] CHIP:DIS: Updating services using commissioning mode 1
[1657553787.296479][2935751:2935751] CHIP:DL: Using wifi MAC for hostname
[1657553787.296483][2935751:2935751] CHIP:DIS: Advertise commission parameter vendorID=65521 productID=32769 discriminator=3840/15
[1657553787.296570][2935751:2935751] CHIP:DL: PublishService E31B286542F05A71._matterc._udp5540.-1
[1657553787.297151][2935751:2935751] CHIP:DIS: mDNS service published: _matterc._udp
[1657553787.297160][2935751:2935751] CHIP:DL: Avahi re-register required
[1657553787.297514][2935751:2935751] CHIP:DL: PublishService E31B286542F05A71._matterc._udp5540.-1
[1657553787.298095][2935751:2935751] CHIP:DIS: mDNS service published: _matterc._udp
[1657553788.230388][2935751:2935751] CHIP:DL: Avahi client registered
[1657553788.230775][2935751:2935751] CHIP:DIS: Updating services using commissioning mode 1
[1657553788.231200][2935751:2935751] CHIP:DL: Using wifi MAC for hostname
[1657553788.231210][2935751:2935751] CHIP:DIS: Advertise commission parameter vendorID=65521 productID=32769 discriminator=3840/15
[1657553788.231385][2935751:2935751] CHIP:DL: PublishService E31B286542F05A71._matterc._udp5540.-1
[1657553788.232029][2935751:2935751] CHIP:DIS: mDNS service published: _matterc._udp
[1657553788.234140][2935751:2935751] CHIP:DL: Avahi group established
I can reproduce it on branch sve, commit 9b2d738791d16650e09e5353d7bce299991f1bf9 by starting chip-lighting-app (uncommissioned) and commisisoning it via chip-tool pairing onnetwork 1 20202021.
Logs:
[1657604513.220448][1197590:1197590] CHIP:DMG: AccessControl: checking f=0 a=p s=0xFFFFFFFB00000000 t= c=0x0000_003E e=0 p=a
[1657604513.220453][1197590:1197590] CHIP:DMG: AccessControl: implicit admin (PASE)
[1657604513.220457][1197590:1197590] CHIP:DMG: Received command for Endpoint=0 Cluster=0x0000_003E Command=0x0000_0006
[1657604513.220465][1197590:1197590] CHIP:ZCL: OpCreds: Received an AddNOC command
[1657604513.220469][1197590:1197590] CHIP:EM: Sending Standalone Ack for MessageCounter:13776728 on exchange 24441r
[1657604513.220480][1197590:1197590] CHIP:IN: Prepared secure message 0x7ffc825c6aa0 to 0xFFFFFFFB00000000 (0) of type 0x10 and protocolId (0, 0) on exchange 24441r with MessageCounter:226524765.
[1657604513.220486][1197590:1197590] CHIP:IN: Sending encrypted msg 0x7ffc825c6aa0 with MessageCounter:226524765 to 0xFFFFFFFB00000000 (0) at monotonic time: 00000000027B416F msec
[1657604513.220504][1197590:1197590] CHIP:EM: Flushed pending ack for MessageCounter:13776728 on exchange 24441r
[1657604513.220588][1197590:1197590] CHIP:FP: Validating NOC chain
[1657604513.221193][1197590:1197590] CHIP:FP: NOC chain validation successful
[1657604513.221200][1197590:1197590] CHIP:FP: Added new fabric at index: 0x1
[1657604513.221204][1197590:1197590] CHIP:FP: Assigned compressed fabric ID: 0x3211F1EBC90FAFD0, node ID: 0x0000000000000001
[1657604513.221210][1197590:1197590] CHIP:TS: Last Known Good Time: 2022-07-12T07:39:36
[1657604513.221214][1197590:1197590] CHIP:TS: New proposed Last Known Good Time: 2021-01-01T00:00:00
[1657604513.221218][1197590:1197590] CHIP:TS: Retaining current Last Known Good Time
[1657604513.221258][1197590:1197590] CHIP:DMG: Endpoint 0, Cluster 0x0000_003E update version to 5be1b54b
[1657604513.221265][1197590:1197590] CHIP:DMG: Endpoint 0, Cluster 0x0000_003E update version to 5be1b54c
[1657604513.221410][1197590:1197590] CHIP:DL: writing settings to file (/tmp/chip_kvs-LVjr4s)
[1657604513.221546][1197590:1197590] CHIP:DL: renamed tmp file to file (/tmp/chip_kvs)
[1657604513.221581][1197590:1197590] CHIP:DL: writing settings to file (/tmp/chip_kvs-UBIDnU)
[1657604513.221657][1197590:1197590] CHIP:DL: renamed tmp file to file (/tmp/chip_kvs)
[1657604513.221686][1197590:1197590] CHIP:DL: writing settings to file (/tmp/chip_kvs-mdMFcx)
[1657604513.221759][1197590:1197590] CHIP:DL: renamed tmp file to file (/tmp/chip_kvs)
[1657604513.221770][1197590:1197590] CHIP:DMG: AccessControl: validating f=1 p=a a=c s=1 t=0
[1657604513.221775][1197590:1197590] CHIP:DMG: validating subject 0x000000000001B669
[1657604513.221817][1197590:1197590] CHIP:DL: writing settings to file (/tmp/chip_kvs-MfBjQv)
[1657604513.221893][1197590:1197590] CHIP:DL: renamed tmp file to file (/tmp/chip_kvs)
[1657604513.221917][1197590:1197590] CHIP:EVL: LogEvent event number: 0x0000000000010001 priority: 1, endpoint id: 0x0 cluster id: 0x0000_001F event id: 0x0 Sys timestamp: 0x00000000027B4171
[1657604513.221924][1197590:1197590] CHIP:ZCL: OpCreds: ACL entry created for Fabric index 0x1 CASE Admin Subject 0x000000000001B669
[1657604513.222090][1197590:1197590] CHIP:DL: Using wifi MAC for hostname
[1657604513.222097][1197590:1197590] CHIP:DIS: Advertise operational node 3211F1EBC90FAFD0-0000000000000001
[1657604513.222278][1197590:1197590] CHIP:DL: Cannot set hostname on this system, continue anyway...
[1657604513.222289][1197590:1197590] CHIP:DL: PublishService 3211F1EBC90FAFD0-0000000000000001._matter._tcp5540.-1
[1657604513.222511][1197590:1197590] CHIP:DIS: mDNS service published error: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:483: CHIP Error 0x000000AC: Internal error
[1657604513.222519][1197590:1197590] CHIP:DMG: Endpoint 0, Cluster 0x0000_003E update version to 5be1b54d
[1657604513.222523][1197590:1197590] CHIP:DMG: Endpoint 0, Cluster 0x0000_003E update version to 5be1b54e
[1657604513.222531][1197590:1197590] CHIP:DMG: ICR moving to [ Preparing]
[1657604513.222536][1197590:1197590] CHIP:DMG: ICR moving to [AddingComm]
[1657604513.222542][1197590:1197590] CHIP:DMG: ICR moving to [AddedComma]
[1657604513.222546][1197590:1197590] CHIP:ZCL: OpCreds: successfully created fabric index 0x1 via AddNOC
[1657604513.222559][1197590:1197590] CHIP:DMG: Decreasing reference count for CommandHandler, remaining 0
[1657604513.222564][1197590:1197590] CHIP:EM: Piggybacking Ack for MessageCounter:13776728 on exchange: 24441r
[1657604513.222580][1197590:1197590] CHIP:IN: Prepared secure message 0x55d023ce81f8 to 0xFFFFFFFB00000000 (1) of type 0x9 and protocolId (0, 1) on exchange 24441r with MessageCounter:226524766.
[1657604513.222589][1197590:1197590] CHIP:IN: Sending encrypted msg 0x55d023ce81f8 with MessageCounter:226524766 to 0xFFFFFFFB00000000 (1) at monotonic time: 00000000027B4171 msec
[1657604513.222620][1197590:1197590] CHIP:DMG: ICR moving to [CommandSen]
[1657604513.222625][1197590:1197590] CHIP:DMG: ICR moving to [AwaitingDe]
[1657604513.222651][1197590:1197590] CHIP:EM: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:13776729 on exchange 24440r
[1657604513.222657][1197590:1197590] CHIP:EM: Found matching exchange: 24440r, Delegate: (nil)
[1657604513.222664][1197590:1197590] CHIP:EM: Rxd Ack; Removing MessageCounter:226524764 from Retrans Table on exchange 24440r
[1657604513.222669][1197590:1197590] CHIP:EM: Removed CHIP MessageCounter:226524764 from RetransTable on exchange 24440r
[1657604513.225310][1197590:1197590] CHIP:EM: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:13776730 on exchange 24441r
[1657604513.225327][1197590:1197590] CHIP:EM: Found matching exchange: 24441r, Delegate: (nil)
[1657604513.225334][1197590:1197590] CHIP:EM: Rxd Ack; Removing MessageCounter:226524766 from Retrans Table on exchange 24441r
[1657604513.225338][1197590:1197590] CHIP:EM: Removed CHIP MessageCounter:226524766 from RetransTable on exchange 24441r
[1657604514.146345][1197590:1197590] CHIP:DL: Avahi group established
CHIP:DIS: mDNS service published error: ../../third_party/connectedhomeip/src/platform/Linux/DnssdImpl.cpp:483: CHIP Error 0x000000AC: Internal error
Commissioning fails as chip-tool cannot resolve the address
@sstruchtrup looks so far this issue still exist. For the Linux which run avahi-daemon, which Linux kernel version is it running?
@TE-N-ElvenWang Can't say which kernel version was running, but this is a problem of an incorrect API usage, not related to any particular kernel version.
@sstruchtrup looks this issue exist for sve-2.
This issue seems to exist in master as well. This seems to be due to the following code in src/platform/Linux/DnssdImpl.cpp:514
// Ideally the callback would be called from `HandleGroupState` when the `AVAHI_ENTRY_GROUP_ESTABLISHED` state
// is received. But the current code use the `userdata` field to pass a pointer to the current MdnsAvahi instance
// and this is all comes from MdnsAvahi::Init that does not have any clue about the `type` that *will* be published.
// The code needs to be updated to support that callback properly.
if (CHIP_NO_ERROR == error)
{
callback(context, type.c_str(), service.mName, CHIP_NO_ERROR);
}
else
{
callback(context, nullptr, nullptr, error);
}
I can reproduce this on my unit, and the call to avahi_entry_group_add_service_strlst at line 480 is failing due to the AvahiEntryGroup is in the state of AVAHI_ENTRY_GROUP_REGISTERING
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.