datasophon icon indicating copy to clipboard operation
datasophon copied to clipboard

[Bug] Yarn resourcemanagergc Alarms But the rm gc curve does not exceed the threshold

Open mengbaba3316 opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Version ddp1.2.1 Yarn resourcemanagergc Alarms But the rm gc curve does not exceed the threshold b12d06b87586bf22c357c51cec4f8d0 I suspect that the UI page status update is not timely, and I feel that other components will also have this problem

What you expected to happen

Hopefully, we can resolve this issue and check the other components

How to reproduce

When the threshold is exceeded and the gc time of the restart service decreases, this alarm is occasionally displayed

Anything else

No response

Version

dev

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

mengbaba3316 avatar May 21 '24 02:05 mengbaba3316

The ResourceManagerGC indicator of resourcemanager is incorrect, you can turn it off

datasophon avatar May 22 '24 01:05 datasophon

应该是告警时效性的问题, 在告警发触发的时候产生了告警, 告警记录的状态并没有更新导致的。 重启yarn 服务告警就没有了 告警的计算逻辑应该是没问题的 。
我发现从 alertmanager 发送的告警信息 status 都是 firing , 没有 resolved , 导致告警记录的状态不会更新 image @datasophon

hawk9821 avatar May 27 '24 09:05 hawk9821