emqx
emqx copied to clipboard
runq_overload alert not cleared sometimes
What happened?
Just for tracking
Details see:
https://github.com/emqx/emqx/issues/13308#issuecomment-2191320311
What did you expect to happen?
runs_overload alarm should be cleared when the node is not overloaded
How can we reproduce it (as minimally and precisely as possible)?
Don't know....
Anything else we need to know?
No response
EMQX version
EMQX 5.5
$ ./bin/emqx_ctl broker
# paste output here
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
Log files
Not reproduced, No log files are provided
Just read the code, it looks like alarm recording, delete, are dirty ops via mria, although rpc may get retried but if node get restarted during upgrade. rpc call will be missed.
@chaymankala could you dump the tables on the trouble nodes?
emqx eval 'ets:tab2list(emqx_activated_alarm)'
and
emqx eval 'ets:tab2list(emqx_deactivated_alarm)'
Second look, deactivate alarm has timeout of 5secs.
https://github.com/emqx/emqx/blob/359bc38aa4564675a6b0d20d02b3a1ba876b5c54/apps/emqx/src/emqx_alarm.erl#L158
Recently we updated it to 5.7.0 so we had to restart the cluster. Once the cluster is restarted, the alarms are cleared, So the next time when the alarms get stuck, I will run these commands to give you output. Thank you for your response
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.