zigbee2mqtt icon indicating copy to clipboard operation
zigbee2mqtt copied to clipboard

Connects to MQTT but never starts the http listener

Open disconn3ct opened this issue 2 months ago • 10 comments

What happened?

After a restart, it failed to start. Debug shows it connecting to mqtt successfully (mqtts://mqtt.DOMAIN no errors, valid cert) and then it just sits until the health check restarts it. I raised that limit to 15 minutes with no change. It originally occurred with the 1.37.0 update, but rolling back didn't help. I suspect the restart was the cause, not the update.

What did you expect to happen?

HTTP listener starts eventually

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

unknown, frontend isn't starting. container v1.37.0

Adapter firmware version

again I can't get to it but it is updated w/in the past year

Adapter

Sonoff zb-dongle-p

Setup

helm on k3s on rpi4

Debug log

The log lists literally every device and half of my configuration secrets. I'm happy to extract anything that would help. Here are some hopefully useful excerpts:

[2024-05-07 09:34:40] info:     z2m: Logging to console, file (filename: log.log)
[2024-05-07 09:34:44] debug:    z2m: Loaded state from file /data/state.json
[2024-05-07 09:34:45] info:     z2m: Starting Zigbee2MQTT version 1.37.0 (commit #46f34c8)
[2024-05-07 09:34:45] info:     z2m: Starting zigbee-herdsman (0.45.0)
[2024-05-07 09:34:45] debug:    z2m: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null,"disableLED":false},"backupPath":"/data/coordinator_backup.json","databaseBackupPath":"/data/database.db.backup","databasePath":"/data/database.db","network":{"channelList":[20],"extendedPanID":[.......],"networkKey":"HIDDEN","panID":....},"serialPort":{"adapter":"auto","path":"tcp://ser2sock-zigbee:10000"}}'
[2024-05-07 09:34:45] debug:    zh:controller: Starting with options '{"network":{"networkKeyDistribute":false,"networkKey":[...],"panID":...,"extendedPanID":[......],"channelList":[20]},"serialPort":{"path":"tcp://ser2sock-zigbee:10000","adap
ter":"auto"},"databasePath":"/data/database.db","databaseBackupPath":"/data/database.db.backup","backupPath":"/data/coordinator_backup.json","adapter":{"disableLED":false,"concurrent":null,"delay":null}}'
[2024-05-07 09:34:45] info:     zh:zstack:znp: Opening TCP socket with ser2sock-zigbee:10000
[2024-05-07 09:34:46] info:     zh:zstack:znp: Socket connected
[2024-05-07 09:34:46] info:     zh:zstack:znp: Socket ready
[2024-05-07 09:34:46] info:     zh:zstack:znp: Writing CC2530/CC2531 skip bootloader payload
[2024-05-07 09:34:46] debug:    zh:zstack:unpi:writer: --> buffer [239]
[2024-05-07 09:34:46] debug:    zh:zstack:unpi:parser: <-- [33,83,69,82,50,83,79,67,75,32,67,111,110,110,101,99,116,101,100,13,10,33,83,69,82,50,83,79,67,75,32,83,69,82,73,65,76,95,67,79,78,78,69,67,84,69,68,13,10]
[2024-05-07 09:34:46] debug:    zh:zstack:unpi:parser: --- parseNext [33,83,69,82,50,83,79,67,75,32,67,111,110,110,101,99,116,101,100,13,10,33,83,69,82,50,83,79,67,75,32,83,69,82,73,65,76,95,67,79,78,78,69,67,84,69,68,13,10]
[2024-05-07 09:34:47] info:     zh:zstack:znp: Skip bootloader for CC2652/CC1352
[2024-05-07 09:34:47] debug:    zh:zstack:znp: SREQ: --> SYS - ping - {"capabilities":1}
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:writer: --> frame [254,0,33,1,32]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: <-- [254,2,97,1,89,6,61]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --- parseNext [33,83,69,82,50,83,79,67,75,32,67,111,110,110,101,99,116,101,100,13,10,33,83,69,82,50,83,79,67,75,32,83,69,82,73,65,76,95,67,79,78,78,69,67,84,69,68,13,10,254,2,97,1,89,6,61]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --> parsed 2 - 3 - 1 - 1 - [89,6] - 61
[2024-05-07 09:34:47] debug:    zh:zstack:znp: SRSP: <-- SYS - ping - {"capabilities":1625}
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --- parseNext []
[2024-05-07 09:34:47] debug:    zh:zstack:znp: SREQ: --> SYS - version - {}
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:writer: --> frame [254,0,33,2,35]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: <-- [254,10,97,2,2,1,2,7,1,107,177,52,1,0,129]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --- parseNext [254,10,97,2,2,1,2,7,1,107,177,52,1,0,129]
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --> parsed 10 - 3 - 1 - 2 - [2,1,2,7,1,107,177,52,1,0] - 129
[2024-05-07 09:34:47] debug:    zh:zstack:znp: SRSP: <-- SYS - version - {"transportrev":2,"product":1,"majorrel":2,"minorrel":7,"maintrel":1,"revision":20230507}
[2024-05-07 09:34:47] debug:    zh:zstack:unpi:parser: --- parseNext []
[2024-05-07 09:34:47] debug:    zh:zstack: Adapter concurrent: 16
2024-05-07 09:34:47] debug:    zh:zstack: Detected znp version 'zStack3x0' ({"transportrev":2,"product":1,"majorrel":2,"minorrel":7,"maintrel":1,"revision":20230507})
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: beginning znp startup
...
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '1' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '2' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '3' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '4' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '5' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '6' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '8' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '10' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '11' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '110' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '12' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '13' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '47' already registered
[2024-05-07 09:34:47] debug:    zh:adapter:zstack:manager: endpoint '242' already registered
...
[2024-05-07 09:34:47] debug:    zh:controller: Started with result 'resumed'
[2024-05-07 09:34:47] debug:    zh:controller: Injected database: true, adapter: true
...
[2024-05-07 09:34:48] debug:    zh:controller:device: Request Queue (0x00124b002a9a8400): default expiration timeout set to 0
[2024-05-07 09:34:48] debug:    zh:controller:device: Request Queue (0x680ae2fffef4596f): default expiration timeout set to 43200000
[2024-05-07 09:34:48] debug:    zh:controller:device: Request Queue (0x0017880110372268): default expiration timeout set to 0
...
[2024-05-07 09:34:48] info:     z2m: zigbee-herdsman started (resumed)
[2024-05-07 09:34:48] info:     z2m: Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20230507,"transportrev":2},"type":"zStack3x0"}'
...
[2024-05-07 09:34:48] debug:    z2m: Zigbee network parameters: {"channel":20,"extendedPanID":"0xZZZZZZZZZZZZZZZZ","panID":00000}
[2024-05-07 09:34:48] info:     z2m: Currently 38 devices are joined:
...
[2024-05-07 09:34:48] debug:    zh:controller: Disable joining
...
[2024-05-07 09:34:48] info:     z2m: Connecting to MQTT server at mqtts://mqtt.DOMAIN
[2024-05-07 09:34:48] debug:    z2m: Using MQTT keepalive: 60
[2024-05-07 09:34:48] debug:    z2m: Using MQTT login with username: zigbee2mqtt
[2024-05-07 09:34:48] debug:    z2m: Using MQTT client ID: 'zigbee2mqtt'
[2024-05-07 09:34:48] info:     z2m: Connected to MQTT server
[2024-05-07 09:34:48] debug:    z2m: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"online"}'
[2024-05-07 09:34:49] debug:    z2m: Received MQTT message on 'zigbee2mqtt/Bay Lights/availability' with data '{"state":"offline"}'
...
[2024-05-07 09:34:49] debug:    z2m: Received MQTT message on 'zigbee2mqtt/bridge/extensions' with data '[]'                                                                                                                                                                                                                  
[2024-05-07 09:34:49] debug:    z2m: Received MQTT message on 'zigbee2mqtt/bridge/groups' with data '[]'                                                                                                                                                                                                                      
...
[2024-05-07 09:34:49] debug:    z2m: Received MQTT message on 'zigbee2mqtt/bridge/devices' with data .......
...
[2024-05-07 09:34:50] debug:    z2m: Received MQTT message on 'zigbee2mqtt/bridge/extensions' with data '[]'
[2024-05-07 09:34:50] debug:    z2m: Received MQTT message on 'zigbee2mqtt/bridge/groups' with data '[]'
...
[2024-05-07 09:34:56] debug:    z2m: Received Zigbee message from 'Living Room Window', type 'attributeReport', cluster 'manuSpecificSamsungAccelerometer', data '{"acceleration":0,"x_axis":0,"y_axis":0,"z_axis":0}' from endpoint 1 with groupID 0
... It continues receiving zigbee messages
[2024-05-07 09:39:08] debug:    z2m: Received Zigbee message from 'Another Device', type 'attributeReport', cluster 'genOnOff', data '{"onOff":1}' from endpoint 1 with groupID 0
...
[2024-05-07 09:45:00] debug:    zh:zstack:backup: fetched adapter network security material table (capacity=1, used=1)
[2024-05-07 09:45:00] debug:    zh:zstack:backup: Following devices with link key are missing from new backup but present in old backup and database, adding them back: 
[2024-05-07 09:45:00] info:     zh:controller: Wrote coordinator backup to '/data/coordinator_backup.json'

Lots of received messages but nothing is showing on MQTT under zigbee/# except HA commands.

Eventually the health check times out because nothing is listening on 8080. I raised it to 15 minutes but it still doesn't come ready in time. It is running on a node with minimal other responsibilities (just the associated ser2sock.)

disconn3ct avatar May 07 '24 14:05 disconn3ct

To rule out as much as possible, I've moved the radio from the original node to a new node and replaced the hub+cable with a normal extension cable. Nothing changed.

(Somewhat unrelated, but the zwave stack is still functioning on the original node and with the original USB hub. On the hub the radios are about 4 inches apart from each other, and the whole thing is 2 feet from the nearest hardware.)

disconn3ct avatar May 07 '24 15:05 disconn3ct

Just to rule out the obvious : can you make sure that the frontend is activated in your configuration ? :)

frontend: true

The frontend extension is firing a log message at startup when binding the port. Or maybe your log message is too short and it is starting after, as it is one of the last extension to be loaded.

LaurentChardin avatar May 14 '24 06:05 LaurentChardin

Advanced form, but yes:

frontend:
  port: 8080
  host: 0.0.0.0
  url: https://zigbee.XXXXXX

The only change was the version and the restarts. (I tried 1.37.1 today but it still hung. After about an hour I reverted to 1.36.1, but it hasn't recovered yet.) Along the way it has survived restarts without a version change, so I wonder if there is something getting stuck on the radio side in the 1.37 update.

There is plenty of available cpu and ram, and according to prometheus it is not using much of either (under 10% of a core and 500M, no limits/requests.) The nodes have ipv6 disabled, but if that matters I would expect 0.0.0.0 to resolve it. It acts normally, in that if I kick it from MQTT it reconnects, and it is bridging zigbee and MQTT. It just never opens the http listener.

Edit re logs: it logs the bind message if it recovers, but not when it is hung.

disconn3ct avatar May 15 '24 15:05 disconn3ct

Dumb question : are you using docker or a container ? Did you try by commenting the host property ?

LaurentChardin avatar May 15 '24 15:05 LaurentChardin

0.0.0.0 traditionally means 'every available IPV4 address'. I removed the host line, but so far (8 minutes) it is still just hanging. These are info, sorry. I will restart with debug for the next run. Nothing is logged after that last message. If I watch MQTT, it is publishing availability updates but incoming commands aren't being processed. (So I was wrong earlier, it is not behaving normally on the mqtt side either.)

Using '/data' as data directory
Zigbee2MQTT:info  2024-05-15 11:40:16: Logging to console and directory: '/data/log/2024-05-15.11-40-15' filename: log.txt
Zigbee2MQTT:info  2024-05-15 11:40:16: Starting Zigbee2MQTT version 1.36.1 (commit #ffc2ff1)
Zigbee2MQTT:info  2024-05-15 11:40:16: Starting zigbee-herdsman (0.40.3)
Zigbee2MQTT:info  2024-05-15 11:40:18: zigbee-herdsman started (resumed)
Zigbee2MQTT:info  2024-05-15 11:40:18: Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20230507,"transportrev":2},"type":"zStack3x0"}'
Zigbee2MQTT:info  2024-05-15 11:40:18: Currently 38 devices are joined:
...
Zigbee2MQTT:info  2024-05-15 11:40:18: Zigbee: disabling joining new devices.
Zigbee2MQTT:info  2024-05-15 11:40:19: Connecting to MQTT server at mqtts://mqtt.xxxxxxxxx
Zigbee2MQTT:info  2024-05-15 11:40:19: Connected to MQTT server
Zigbee2MQTT:info  2024-05-15 11:40:19: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"online"}'
Zigbee2MQTT:error 2024-05-15 11:40:19: MQTT error: write EPIPE
Zigbee2MQTT:info  2024-05-15 11:40:20: Connected to MQTT server
Zigbee2MQTT:info  2024-05-15 11:40:20: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"online"}'

disconn3ct avatar May 15 '24 15:05 disconn3ct

0.0.0.0 traditionally means 'every available IPV4 address'. I removed the host line, but so far (8 minutes) it is still just hanging. These are info, sorry. I will restart with debug for the next run. Nothing is logged after that last message. If I watch MQTT, it is publishing availability updates but incoming commands aren't being processed. (So I was wrong earlier, it is not behaving normally on the mqtt side either.)

Using '/data' as data directory
Zigbee2MQTT:info  2024-05-15 11:40:16: Logging to console and directory: '/data/log/2024-05-15.11-40-15' filename: log.txt
Zigbee2MQTT:info  2024-05-15 11:40:16: Starting Zigbee2MQTT version 1.36.1 (commit #ffc2ff1)
Zigbee2MQTT:info  2024-05-15 11:40:16: Starting zigbee-herdsman (0.40.3)
Zigbee2MQTT:info  2024-05-15 11:40:18: zigbee-herdsman started (resumed)
Zigbee2MQTT:info  2024-05-15 11:40:18: Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20230507,"transportrev":2},"type":"zStack3x0"}'
Zigbee2MQTT:info  2024-05-15 11:40:18: Currently 38 devices are joined:
...
Zigbee2MQTT:info  2024-05-15 11:40:18: Zigbee: disabling joining new devices.
Zigbee2MQTT:info  2024-05-15 11:40:19: Connecting to MQTT server at mqtts://mqtt.xxxxxxxxx
Zigbee2MQTT:info  2024-05-15 11:40:19: Connected to MQTT server
Zigbee2MQTT:info  2024-05-15 11:40:19: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"online"}'
Zigbee2MQTT:error 2024-05-15 11:40:19: MQTT error: write EPIPE
Zigbee2MQTT:info  2024-05-15 11:40:20: Connected to MQTT server
Zigbee2MQTT:info  2024-05-15 11:40:20: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"online"}'
Zigbee2MQTT:info  2024-05-15 11:49:50: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
Zigbee2MQTT:info  2024-05-15 11:49:50: Disconnecting from MQTT server

(The offline is me restarting it for debug logging)

disconn3ct avatar May 15 '24 15:05 disconn3ct

Debug includes it receiving zigbee messages, and those debug logs - but not the actual data - are sent to mqtt as well. "message": "Received Zigbee message from 'Humidor', type 'attributeReport', cluster 'haElectricalMeasurement', data '{\"activePower\":59}' from endpoint 1 with groupID 0"

disconn3ct avatar May 15 '24 15:05 disconn3ct

When sending commands from HA: Zigbee2MQTT:debug 2024-05-15 12:03:54: Received MQTT message on 'zigbee2mqtt/Device/set' with data '{"state":"OFF"}' but so far (12:07) it hasn't logged any followup. Just more incoming messages.

disconn3ct avatar May 15 '24 16:05 disconn3ct

1.37.1 has not come online yet, running overnight with about an hour between restarts.

I'm downgrading to 1.36.1 to see if it recovers. (If it doesn't recover, I'll shut it all down and unplug it for an hour.)

disconn3ct avatar May 16 '24 14:05 disconn3ct

This is still completely down for me. I have tried unplugging everything for a full 24 hours, and it still behaves exactly the same. Downgrading did not help.

What can I do to help diagnose this?

disconn3ct avatar May 23 '24 11:05 disconn3ct