shinken icon indicating copy to clipboard operation
shinken copied to clipboard

After doing Shinken restart, acknowledgement history is getting erased

Open shilpa-karri opened this issue 7 years ago • 11 comments

Hello,

I am using Shinken along with Mongo DB as backend. From few days observed that, whenever shinken is restarted or reloaded, critical or warning alerts which have been already acknowledged earlier appear on the shinken console as new ones. Also, downtimes which are created are getting erased too.

I already have retention mongo DB module and logstore mongo DB modules installed. Please guide to resolve this issue.

shilpa-karri avatar Jun 07 '17 07:06 shilpa-karri

Hello,

How did you enable the retention module for scheduler ?

Please share your config file.

Regards

2017-06-07 9:22 GMT+02:00 shilpa-karri [email protected]:

Hello,

I am using Shinken along with Mongo DB as backend. From few days observed that, whenever shinken is restarted or reloaded, critical or warning alerts which have been already acknowledged earlier appear on the shinken console as new ones. Also, downtimes which are created are getting erased too.

I already have retention mongo DB module and logstore mongo DB modules installed. Please guide to resolve this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/naparuba/shinken/issues/1913, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxIsjGttckMSYNQssND_IytzvBTqBRlks5sBk-3gaJpZM4NyTZL .

olivierHa avatar Jun 07 '17 08:06 olivierHa

Hello,

Please find the retention config files below:

]# cat retention-mongodb.cfg

Module: retention-mongodb

Loaded by: Scheduler

Retention file to keep state between process restarts.

define module { module_name retention-mongodb module_type mongodb_retention

uri             10.xx.xx.xx,10.xx.xx.xx
database        shinkenPROD

# Advanced option if you are running a cluster environment
replica_set     ShinkenPROD

}

shilpa-karri avatar Jun 14 '17 07:06 shilpa-karri

Your mongo uri is not correctly configured. It should be something looking like

uri             mongodb://10.xx.xx.xx,10.xx.xx.xx/

Could you have a test with this syntax ?

geektophe avatar Jun 15 '17 13:06 geektophe

We also get this behavior. Sometimes an acknowledgement wont go away and sometimes they'll be erased. Using mongodb (mongo-logs) for logs and redis (RedisRetention) for retention. Not sure which module (if any) saves the acknowledgement information.

cleonn avatar Jun 26 '17 12:06 cleonn

The modules that save acknowledgments are all the xxxRetention modules.

The issue that mentions @shilpa-karri seems to be a configuration mistake, as it occurs on any configuration reload. Yours seems more transient. Did you notice errors relative to retention dump or load in the scheduler logs ?

By curiosity, why did you decide to store logs in mongo, and retention in redis ? Mongo is safer than redis, replicates better, and you have 2 different technologies to maintain.

geektophe avatar Jun 27 '17 08:06 geektophe

No problems with load or save in the logs. Could be that we don't have full Redis-persistency as we thought we had. Yeah, ours is transient, and we had a service we couldn't get out of maintenance no matter how hard we tried :/ Then the problem sorted itself out somehow after a week.

We did use mongo for retention too, but it got too bloated, filled the disk and used up too much memory and bogged down a good server. Redis is much more efficient. We've got around 650 servers and almost 10 000 services to check.

cleonn avatar Jun 27 '17 12:06 cleonn

We have pinned this down to a scheduler restart. That seems to reset brokers held acknowledgement state. Haven't yet begun to go through the code.

cleonn avatar Jul 10 '17 08:07 cleonn

Hah, ignore that. We had default_ack_sticky set to 2 in our webui2.cfg which means that when the scheduler restarts and the service gets a changed state the acknowledgement is lost. Setting 'default_ack_sticky 1', and also making sure the server running schedulers actually could write to the redis store, solved our problems :)

cleonn avatar Jul 11 '17 09:07 cleonn

is Shinken dead????

trazomtg avatar Jul 18 '17 10:07 trazomtg

@trazomtg no it's not. The people working on it simply had less time to spend it those days, but it's definitely still alive.

geektophe avatar Aug 07 '17 08:08 geektophe

We use it, but it would be good if someone could merge bug fixes into the master branch. Now we have to have our separate code repository to hold shinken with our fixes in.

cleonn avatar Aug 22 '17 13:08 cleonn