interface icon indicating copy to clipboard operation
interface copied to clipboard

Problem with old downtimes

Open duylong opened this issue 5 years ago • 6 comments

Hi,

I have a problem with Upcoming downtimes. My old downtimes are still present with the status "Downtime currently running" and "Cancel" button.

upcoming-downtimes

Do you know why?

duylong avatar May 11 '20 12:05 duylong

It seems that when worker is down, the worker does not detect the end downtime. The worker don't have a auto restart when everything goes wrong??

statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: No alive nodes found in your cluster
statusengine-worker[13]: No alive nodes found in your cluster
...
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: No alive nodes found in your cluster
statusengine-worker[13]: No alive nodes found in your cluster

No more errors after a restarted service but the obsolete downtimes are still there..

Even if it's a worker problem, In any case the interface should still not display expired downtimes.

duylong avatar May 12 '20 00:05 duylong

This sounds interesting. Basically this is a worker related issue. How ever, the worker forks a MiscChild which will handle Notifications, Downtimes and Acknowledgements and a separate PerfdataChild which process performance data.

Normally whatever happens to the perfdata child should not cause any side effects to other processes / workers.

In any case the interface should still not display expired downtimes.

I will do some investigation into this.

nook24 avatar May 19 '20 18:05 nook24

Yes I noticed that I had no side effect with the worker errors. My problem is still present, I can't find the source of the problem to reproduce it. Currently I manually clean the MySQL database, it is not very clean.

duylong avatar May 20 '20 12:05 duylong

Is this still a thing?

nook24 avatar May 04 '21 17:05 nook24

I recently updated to the latest version, I am looking to see if the problem comes back or not ;-)

duylong avatar May 05 '21 07:05 duylong

The command "/opt/statusengine/worker/bin/Console.php cleanup" should not clean up old ACK / DOWNTIME?

I still have DOWNTIME from "23:59 12.13.2020"...

Startusengine Cleanup started at: 2021-05-05 09:23:40
Delete old host records
Delete old host check records... done
Delete old host acknowledgements records... done
Delete old host notification records... done
Delete old host state history records... done
Delete old host downtime history records... done
Delete old service records
Delete old service check records... done
Delete old service acknowledgements records... done
Delete old service notification records... done
Delete old service state history records... done
Delete old service downtime history records... done
Delete old misc records
Delete old log entry records... done
Delete old task records... done
Delete old perfdata records for backend elasticsearch done
Cleanup took: 5 seconds...
Startusengine Cleanup finished at: 2021-05-05 09:23:45

duylong avatar May 05 '21 07:05 duylong