emqx icon indicating copy to clipboard operation
emqx copied to clipboard

runq_overload alert not cleared sometimes

Open qzhuyan opened this issue 1 year ago • 3 comments
trafficstars

What happened?

Just for tracking

Details see:

https://github.com/emqx/emqx/issues/13308#issuecomment-2191320311

What did you expect to happen?

runs_overload alarm should be cleared when the node is not overloaded

How can we reproduce it (as minimally and precisely as possible)?

Don't know....

Anything else we need to know?

No response

EMQX version

EMQX 5.5

$ ./bin/emqx_ctl broker
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Log files

Not reproduced, No log files are provided

qzhuyan avatar Jul 22 '24 09:07 qzhuyan

Just read the code, it looks like alarm recording, delete, are dirty ops via mria, although rpc may get retried but if node get restarted during upgrade. rpc call will be missed.

@chaymankala could you dump the tables on the trouble nodes?

emqx  eval 'ets:tab2list(emqx_activated_alarm)'

and

emqx  eval 'ets:tab2list(emqx_deactivated_alarm)'

qzhuyan avatar Jul 30 '24 14:07 qzhuyan

Second look, deactivate alarm has timeout of 5secs.

https://github.com/emqx/emqx/blob/359bc38aa4564675a6b0d20d02b3a1ba876b5c54/apps/emqx/src/emqx_alarm.erl#L158

qzhuyan avatar Jul 30 '24 15:07 qzhuyan

Recently we updated it to 5.7.0 so we had to restart the cluster. Once the cluster is restarted, the alarms are cleared, So the next time when the alarms get stuck, I will run these commands to give you output. Thank you for your response

chaymankala avatar Jul 31 '24 08:07 chaymankala

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 24 '25 08:02 github-actions[bot]