connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

Spring2024 TE2 - cannot commission 'onnetwork' within docker, but can outside

Open jamesharrow opened this issue 1 year ago • 2 comments

Describe the bug

Trying to run TC_ACE_2.1 apparently you can't run this in the TH UI, and we have to run it (according to slack by running in docker container).

chip-repl/python etc cannot discover the device over avahi, but I can discover it and commission it with chip-tool outside the container (and inside - see updated note below). We had similar issues in TE1 (with a different DUT) but that issue was eventually closed out because they thought it was related to chip-tool.

Steps to reproduce the behavior

  1. Launch container:

ubuntu@ubuntu:~/certification-tool/backend/test_collections/sdk_tests/sdk_checkout/python_testing/scripts/sdk$ docker run -v /home/ubuntu/certification-tool/backend/test_collections/sdk_tests/sdk_checkout/python_testing:/root/temp_python_testing -v /var/paa-root-certs:/paa_roots -v /var/run/dbus:/var/run/dbus -v $(pwd):/launch_dir --privileged --network host -it connectedhomeip/chip-cert-bins:0ff709f25f6a7af68314c1b0a147a76c30c234f6

2. check you can see the DUT in a commissioning mode: root@ubuntu:~# avahi-browse -r _matterc._udp

  • wlan0 IPv6 1376D5C8D19257C1 _matterc._udp local
  • wlan0 IPv6 024757C781D958B1 _matterc._udp local
  • wlan0 IPv6 17DEEB0E3A2F9AEE _matterc._udp local = wlan0 IPv6 1376D5C8D19257C1 _matterc._udp local hostname = [7284AEDA1ADB0672.local] address = [2a02:c7c:747a:1a00:be6e:76ff:fe6e:f142] port = [5540] txt = ["PI=" "PH=33" "CM=1" "D=1171" "T=1" "SAI=2000" "SII=800" "VP=4240+4098"] = wlan0 IPv6 024757C781D958B1 _matterc._udp local hostname = [7284AEDA1ADB0672.local] address = [2a02:c7c:747a:1a00:be6e:76ff:fe6e:f142] port = [5540] txt = ["PI=" "PH=33" "CM=1" "D=1171" "T=1" "SAI=2000" "SII=800" "VP=4240+4098"] = wlan0 IPv6 17DEEB0E3A2F9AEE _matterc._udp local hostname = [7284AEDA1ADB0672.local] address = [2a02:c7c:747a:1a00:be6e:76ff:fe6e:f142] port = [5540] txt = ["PI=" "PH=33" "CM=1" "D=1171" "T=1" "SAI=2000" "SII=800" "VP=4240+4098"]

NOTE IT HAS IPv6

3. Try to run the command to run TC_ACE_2.1 per slack comments:

root@ubuntu:~# cd ~/temp_python_testing/scripts/sdk/

root@ubuntu:~/temp_python_testing/scripts/sdk# python3 TC_AccessChecker.py --commissioning-method on-network --discriminator 1171 --passcode 64889390 --storage-path admin_storage.json --paa-trust-store-path /paa_roots --tests test_TC_ACE_2_1

[1707513696.130090][11:11] CHIP:CTL: Setting attestation nonce to random value

[1707513696.132616][11:11] CHIP:CTL: Setting CSR nonce to random value

[1707513696.134322][11:11] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_kvs

[1707513696.134763][11:11] CHIP:DL: writing settings to file (/tmp/chip_kvs-CmpzRz)

[1707513696.135031][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_kvs)

[1707513696.135345][11:11] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_factory.ini

[1707513696.135692][11:11] CHIP:DL: writing settings to file (/tmp/chip_factory.ini-uuA7V9)

[1707513696.135971][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_factory.ini)

[1707513696.136147][11:11] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_config.ini

[1707513696.136421][11:11] CHIP:DL: writing settings to file (/tmp/chip_config.ini-1sOTPy)

[1707513696.136703][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_config.ini)

[1707513696.137069][11:11] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_counters.ini

[1707513696.137335][11:11] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-2dJ3QA)

[1707513696.137537][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)

[1707513696.138116][11:11] CHIP:DL: writing settings to file (/tmp/chip_factory.ini-XmQrFj)

[1707513696.138742][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_factory.ini)

[1707513696.138825][11:11] CHIP:DL: NVS set: chip-factory/unique-id = "D9F6C912C28F1B66"

[1707513696.139156][11:11] CHIP:DL: writing settings to file (/tmp/chip_factory.ini-1uS8GO)

[1707513696.139768][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_factory.ini)

[1707513696.139942][11:11] CHIP:DL: NVS set: chip-factory/vendor-id = 65521 (0xFFF1)

[1707513696.140335][11:11] CHIP:DL: writing settings to file (/tmp/chip_factory.ini-ikkplb)

[1707513696.140925][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_factory.ini)

[1707513696.141015][11:11] CHIP:DL: NVS set: chip-factory/product-id = 32769 (0x8001)

[1707513696.141360][11:11] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-zbZ2XI)

[1707513696.141876][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)

[1707513696.141994][11:11] CHIP:DL: NVS set: chip-counters/reboot-count = 1 (0x1)

[1707513696.142437][11:11] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-u38Bae)

[1707513696.143073][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)

[1707513696.143160][11:11] CHIP:DL: NVS set: chip-counters/total-operational-hours = 0 (0x0)

[1707513696.143569][11:11] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-eLzM43)

[1707513696.144147][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)

[1707513696.144234][11:11] CHIP:DL: NVS set: chip-counters/boot-reason = 0 (0x0)

[1707513696.144563][11:11] CHIP:DL: writing settings to file (/tmp/chip_config.ini-6g4OFP)

[1707513696.145123][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_config.ini)

[1707513696.145202][11:11] CHIP:DL: NVS set: chip-config/regulatory-location = 0 (0x0)

[1707513696.145573][11:11] CHIP:DL: writing settings to file (/tmp/chip_config.ini-zK3jA3)

[1707513696.146161][11:11] CHIP:DL: renamed tmp file to file (/tmp/chip_config.ini)

[1707513696.146247][11:11] CHIP:DL: NVS set: chip-config/location-capability = 2 (0x2)

[1707513696.147128][11:11] CHIP:DL: Got Ethernet interface: eth0

[1707513696.147886][11:11] CHIP:DL: Found the primary Ethernet interface:eth0

[1707513696.148651][11:11] CHIP:DL: Got WiFi interface: wlan0

[1707513696.151096][11:11] CHIP:DL: Found the primary WiFi interface:wlan0

Initializing persistent storage from file: admin_storage.json

Loading configuration from admin_storage.json...

[1707513696.153380][11:11] CHIP:IN: UDP::Init bind&listen port=0

[1707513696.153541][11:11] CHIP:IN: UDP::Init bound to port=41355

[1707513696.153576][11:11] CHIP:IN: UDP::Init bind&listen port=0

[1707513696.153670][11:11] CHIP:IN: UDP::Init bound to port=37553

[1707513696.153694][11:11] CHIP:IN: BLEBase::Init - setting/overriding transport

[1707513696.153714][11:11] CHIP:IN: TransportMgr initialized

[1707513696.153803][11:11] CHIP:FP: Initializing FabricTable from persistent storage

[1707513696.153905][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = g/lkgt, Value = 0xfffffd7fc920 (18)

[1707513696.154030][11:11] CHIP:CTL: Key Found 8

[1707513696.154106][11:11] CHIP:TS: Last Known Good Time: 2023-10-14T01:16:48

[1707513696.154150][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = g/fidx, Value = 0xfffffd7fcb08 (44)

[1707513696.154251][11:11] CHIP:CTL: Key Found 10

[1707513696.154352][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = f/1/n, Value = 0xfffffd7fc608 (400)

[1707513696.154487][11:11] CHIP:CTL: Key Found 244

[1707513696.154527][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = f/1/r, Value = 0xfffffd7fc798 (400)

[1707513696.154646][11:11] CHIP:CTL: Key Found 231

[1707513696.155643][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = f/1/m, Value = 0xfffffd7fc4c8 (44)

[1707513696.155782][11:11] CHIP:CTL: Key Found 9

[1707513696.155882][11:11] CHIP:FP: Fabric index 0x1 was retrieved from storage. Compressed FabricId 0x02C528E284D9C235, FabricId 0x0000000000000001, NodeId 0x000000000001B669, VendorId 0xFFF1

[1707513696.155953][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = g/fs/c, Value = 0xfffffd7fc9b0 (36)

[1707513696.156001][11:11] CHIP:CTL: Key Not Found

[1707513696.156126][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = g/gcc, Value = 0xfffffd7fca3c (4)

[1707513696.156199][11:11] CHIP:CTL: Key Found 4

[1707513696.156232][11:11] CHIP:CTL: StorageAdapter::GetKeyValue: Key = g/gdc, Value = 0xfffffd7fca3c (4)

[1707513696.156270][11:11] CHIP:CTL: Key Found 4

[1707513696.156358][11:11] CHIP:CTL: StorageAdapter::SetKeyValue: Key = g/gcc, Value = 0xfffffd7fca3c (4)

[1707513696.161219][11:11] CHIP:CTL: StorageAdapter::SetKeyValue: Key = g/gdc, Value = 0xfffffd7fca3c (4)

[1707513696.161804][11:11] CHIP:ZCL: Using ZAP configuration...

[1707513696.166067][11:11] CHIP:DL: Avahi client registered

[1707513696.166208][11:11] CHIP:IN: CASE Server enabling CASE session setups

[1707513696.166346][11:11] CHIP:IN: SecureSession[0xaaaaf274cee0]: Allocated Type:2 LSID:2497

[1707513696.166414][11:11] CHIP:SC: Allocated SecureSession (0xaaaaf274cee0) - waiting for Sigma1 msg

[1707513696.166654][11:11] CHIP:CTL: System State Initialized...

[1707513696.167035][11:14] CHIP:DL: CHIP task running

[1707513696.168115][11:14] CHIP:DL: HandlePlatformSpecificBLEEvent 32786

Loading certificate authorities from storage...

New CertificateAuthority at index 1

Loading fabric admins from storage...

New FabricAdmin: FabricId: 0x0000000000000001, VendorId = 0xFFF1

Allocating new controller with CaIndex: 1, FabricId: 0x0000000000000001, NodeId: 0x000000000001B669, CatTags: []

Long dispatch time: 636 ms, for event type 2

[MatterTest] 02-09 21:21:37.013 INFO Starting test set, running 1 tests

[MatterTest] 02-09 21:21:37.018 INFO ==========> CommissionDeviceTest <==========

[MatterTest] 02-09 21:21:37.021 INFO [Test] test_run_commissioning

[MatterTest] 02-09 21:21:37.021 INFO Starting commissioning for root index 1, fabric ID 0x0000000000000001, node ID 0x0000000012344321

[MatterTest] 02-09 21:21:37.022 INFO Commissioning method: on-network

[MatterTest] 02-09 21:21:37.034 INFO Avahi browse: cache exhausted

[MatterTest] 02-09 21:21:37.880 INFO Avahi group established

[MatterTest] 02-09 21:21:37.895 INFO Avahi group established

[MatterTest] 02-09 21:21:38.035 INFO Avahi browse: all for now

[MatterTest] 02-09 21:21:39.049 INFO Avahi browse: cache exhausted

[MatterTest] 02-09 21:21:40.051 INFO Avahi browse: all for now

[MatterTest] 02-09 21:21:42.066 INFO Avahi browse: cache exhausted

[MatterTest] 02-09 21:21:43.069 INFO Avahi browse: all for now

[MatterTest] 02-09 21:21:47.089 INFO Avahi browse: cache exhausted

[MatterTest] 02-09 21:21:48.091 INFO Avahi browse: all for now

[MatterTest] 02-09 21:21:56.115 INFO Avahi browse: cache exhausted

[MatterTest] 02-09 21:21:57.116 INFO Avahi browse: all for now

[MatterTest] 02-09 21:22:07.032 ERROR Mdns discovery timed out

Failed to establish secure session to device: src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout

Failed to commission: src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout

[MatterTest] 02-09 21:22:07.043 INFO [Test] test_run_commissioning FAIL

[MatterTest] 02-09 21:22:07.059 INFO Summary for test class CommissionDeviceTest: Error 0, Executed 1, Failed 1, Passed 0, Requested 1, Skipped 0

[MatterTest] 02-09 21:22:07.060 WARNING Abort all subsequent test classes. Reason: Details=All remaining tests aborted due to: Failed to commission node, Extras=None

[MatterTest] 02-09 21:22:07.066 INFO Summary for test run MatterTest@02-09-2024_21-21-37-011:

Total time elapsed 30.052692557000228s

Artifacts are saved in "/tmp/matter_testing/logs/MatterTest/02-09-2024_21-21-37-011"

Test summary saved in "/tmp/matter_testing/logs/MatterTest/02-09-2024_21-21-37-011/test_summary.yaml"

Test results: Error 0, Executed 1, Failed 1, Passed 0, Requested 1, Skipped 0

INFO:root:Finished test set, ran for 30054856.0ms

DEBUG:chip.CTL:Shutting down the commissioner

DEBUG:chip.CTL:Shutting down the controller

DEBUG:chip.IN:Expiring all sessions for fabric 0x1!!

INFO:chip.FP:Forgetting fabric 0x1

INFO:chip.TS:Pending Last Known Good Time: 2023-10-14T01:16:48

DEBUG:chip.CTL:StorageAdapter::GetKeyValue: Key = g/lkgt, Value = 0xffffb319d820 (18)

DEBUG:chip.CTL:Key Found 8

INFO:chip.TS:Previous Last Known Good Time: 2023-10-14T01:16:48

INFO:chip.TS:Reverted Last Known Good Time to previous value

ERROR:chip.CTL:Shutting down the stack...

DEBUG:chip.CTL:Shutting down the System State, this will teardown the CHIP Stack

DEBUG:chip.IN:SecureSession[0xaaaaf274cee0]: Released - Type:2 LSID:2497

DEBUG:chip.DMG:All ReadHandler-s are clean, clear GlobalDirtySet

INFO:chip.FP:Shutting down FabricTable

INFO:chip.TS:Pending Last Known Good Time: 2023-10-14T01:16:48

DEBUG:chip.CTL:StorageAdapter::GetKeyValue: Key = g/lkgt, Value = 0xfffffd7fd550 (18)

DEBUG:chip.CTL:Key Found 8

INFO:chip.TS:Previous Last Known Good Time: 2023-10-14T01:16:48

INFO:chip.TS:Reverted Last Known Good Time to previous value

INFO:chip.DL:writing settings to file (/tmp/chip_counters.ini-jqRO9u)

INFO:chip.DL:renamed tmp file to file (/tmp/chip_counters.ini)

INFO:chip.DL:NVS set: chip-counters/total-operational-hours = 0 (0x0)

ERROR:chip.DL:Inet Layer shutdown

ERROR:chip.DL:BLE shutdown

ERROR:chip.DL:System Layer shutdown

ERROR:root:Final result: FAIL !

root@ubuntu:~/temp_python_testing/scripts/sdk#

4. exit from the container run chip-tool or chip-repl and the DUT commissions ok.

Expected behavior

No response

Log files

No response

PICS file

No response

Screenshots

No response

Environment

Raspberry PI 8GB Image updated using commandline instructions from previous TE

Additional Information

No response

jamesharrow avatar Feb 09 '24 21:02 jamesharrow

Follow up.

exited the container and commissioned the DUT 'onnetwork' and left it commissioned and the files on the host in /tmp/chip* I relaunched the container but added -v /tmp:/tmp

Then I can read basicinformation ok.

I then factory reset the DUT, and rm /tmp in the container. Once I did this I ran ./chip-tool pairing onnetwork 1 64889390 --paa-trust-store-path /paa_roots/ from within the container and it worked (it had the /tmp:/tmp volume mapped),

I tried it again without the volume mapping and it was still ok - so not related to that (other than if I had commissioned it manually outside the container then it had different RCAC / Fabric ID).

... I then went back to the python scripting and ran the commands above, and again it failed to discover it. It looks like the difference is the --discriminator (which is not needed in chip-tool) but seems to be needed to call the python code.

In chip-repl the devCtrl.CommissionOnNetwork() takes only 2 required positional arguments: 'nodeId' and 'setupPinCode' Normally running scripts in python_testing you can provide -m on-network -d 3840 -p 20202021

So I think onnetwork commissioning has an optional discriminator (and maybe this isn't being handled consistently?)

jamesharrow avatar Feb 09 '24 22:02 jamesharrow

This is not related to TH, this seems to be related to SDK team. @raju-apple Could you transfer to the proper team?

rquidute avatar Feb 15 '24 20:02 rquidute

This issue seems to be still present in SVE test harness - and will block testing. Can we escalate the priority of this - to me it seems fine on the SDK (I can run these commands outside of the test harness) but running under the test harness seems to still have issues.

jamesharrow avatar Feb 29 '24 18:02 jamesharrow

Hi @jamesharrow could you please provide the Logs, also could you run ./scripts/th/th-doctor.sh --complete and provide the output?

rquidute avatar Feb 29 '24 21:02 rquidute

I have changed my network setup (I reflashed the router with OpenWRT) and since then I haven't had this issue and was able to commission the DUT ok with the test harness.

So likely we can mark this as closed now.

jamesharrow avatar Mar 03 '24 13:03 jamesharrow