darwinia
darwinia copied to clipboard
UnboundedChannelPersistentlyLarge
I'm getting these alerts on both Darwinia & Crab nodes:
- alertname = UnboundedChannelPersistentlyLarge
- chain = crab2
- entity = mpsc_import_notification_stream
- instance = localhost:19615
- job = crab-collator
- monitor = CMN02
- severity = warning Annotations:
- message = Channel mpsc_import_notification_stream on node localhost:19615 contains more than 200 items for more than 5 minutes. Node might be frozen.
Ubuntu 20.04 & 22.04 Binary: 6.3.4-e9430a36653
ExecStart=/darwinia
--collator
--chain=crab
--base-path /base-path/
--name 'StakeWorks | Crab | CMN02'
--execution wasm
--prometheus-port 19615
--prometheus-external
--listen-addr /ip4/xx.xx.xx.xx/tcp/30313/ws
--listen-addr /ip6/xx:xx:xx:xx::1/tcp/30313/ws
--
--execution wasm
--chain=kusama
--base-path /base-path/
--sync=warp
--state-pruning 1000
--blocks-pruning 1000
--out-peers 15
--in-peers 35 \
Can you try v6.4.0
?
I've lowered the limit from the UnboundedChannelPersistentlyLarge alert from 750 to 200 (normal). Nodes is already running v6.4.0. Let you know if alert is triggered again.
Alert is also triggered in v6.4.0, but until now, only with Crab2. This is the alert syntax:
- alert: UnboundedChannelPersistentlyLarge
expr: '(
(substrate_unbounded_channel_len{action = "send"} -
ignoring(action) substrate_unbounded_channel_len{action = "received"})
or on(instance) substrate_unbounded_channel_len{action = "send"}
) >= 200'
for: 5m
labels:
severity: warning
annotations:
message: 'Channel {{ $labels.entity }} on node {{ $labels.instance }} contains more than 200 items for more than 5 minutes. Node might be frozen.'
I will raise the time from 5 to 10 minutes and monitor what happens.
Update 19-9: Alert still triggered with 10 minutes. Changed time back to 5 minutes and number of items from 200 to 500. No alerts since a few days now.