icinga2
icinga2 copied to clipboard
Icinga DB: on every check result update state only 1x, not 3x in a row
Before (time: vertical, stack: horizontal):
- Checkable::ExecuteCheck
- Checkable::UpdateNextCheck
- IcingaDB::NextCheckChangedHandler
- HSET icinga:host:state
- HSET icinga:checksum:host:state
- ZADD icinga:nextupdate:host
- IcingaDB::NextCheckChangedHandler
- RandomCheckTask::ScriptFunc
- Checkable::ProcessCheckResult
- Checkable::UpdateNextCheck
- IcingaDB::NextCheckChangedHandler
- HSET icinga:host:state
- HSET icinga:checksum:host:state
- ZADD icinga:nextupdate:host
- IcingaDB::NextCheckChangedHandler
- IcingaDB::NewCheckResultHandler
- HSET icinga:host:state
- HSET icinga:checksum:host:state
- ZADD icinga:nextupdate:host
- Checkable::UpdateNextCheck
- Checkable::ProcessCheckResult
- IcingaDB::StateChangeHandler
- XADD icinga:runtime:state
- IcingaDB::ForwardHistoryEntries
- XADD icinga:history:stream:state
- Checkable::UpdateNextCheck
After:
- Checkable::ExecuteCheck
- Checkable::UpdateNextCheck
- IcingaDB::NextCheckChangedHandler
- RandomCheckTask::ScriptFunc
- Checkable::ProcessCheckResult
- Checkable::UpdateNextCheck
- IcingaDB::NextCheckChangedHandler
- IcingaDB::NewCheckResultHandler
- HSET icinga:host:state
- HSET icinga:checksum:host:state
- ZADD icinga:nextupdate:host
- Checkable::UpdateNextCheck
- Checkable::ProcessCheckResult
- IcingaDB::StateChangeHandler
- XADD icinga:runtime:state
- IcingaDB::ForwardHistoryEntries
- XADD icinga:history:stream:state
- Checkable::UpdateNextCheck
The first state + nextupdate (for overdue) update comes from next_check being set to now + interval immediately before doing the actual check (not to trigger it twice). This update is not only not important for the end user, but even inappropriate. The end user SHALL see next_check being e.g. in -4s, not 5m, as the check is running at the moment.
The second one is just redundant as IcingaDB::NewCheckResultHandler (the third one) is called anyway and will update state + nextupdate as well.
fixes #9501
Awesome
1662388048.691010 [0 127.0.0.1:50579] "HSET" "icinga:host:state" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036" "{\"check_attempt\":1,\"check_commandline\":\"'random'\",\"check_source\":\"alexandersmbp2.int.netways.de\",\"check_timeout\":60000,\"environment_id\":\"dd1de01a2760747f4e82bf7fec93477dcb13e09c\",\"execution_time\":2,\"hard_state\":1,\"host_id\":\"9b5bb23ddee640aeb6f2e71be4336655ae3bd036\",\"id\":\"9b5bb23ddee640aeb6f2e71be4336655ae3bd036\",\"in_downtime\":false,\"is_acknowledged\":0,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":true,\"is_reachable\":true,\"last_state_change\":1662388039211,\"last_update\":1662388048686,\"latency\":3,\"next_check\":1662388058157,\"next_update\":1662388068168,\"normalized_performance_data\":\"time=1662388048.686628 value=590 value_1m=531 value_5m=472 uptime=9.477065\",\"output\":\"Hello from alexandersmbp2.int.netways.de. Icinga 2 has been running for 9 seconds. Version: v2.13.0-425-g60eb41ce9\",\"performance_data\":\"time=1662388048.686628 value=590 value_1m=531 value_5m=472 uptime=9.477065\",\"previous_hard_state\":0,\"previous_soft_state\":0,\"scheduling_source\":\"alexandersmbp2.int.netways.de\",\"severity\":2112,\"soft_state\":1,\"state_type\":1}"
1662388048.691232 [0 127.0.0.1:50579] "HSET" "icinga:checksum:host:state" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036" "{\"checksum\":\"5dcfed0c38d806df8f97353e24998fe5cefbe782\"}"
1662388048.691344 [0 127.0.0.1:50579] "ZADD" "icinga:nextupdate:host" "1662388068.168299" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036"
Even more awesome
{"results":[{"code":200,"status":"Successfully rescheduled check for object '0'."}]}
1662388202.253110 [0 127.0.0.1:50579] "HSET" "icinga:host:state" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036" "{\"check_attempt\":1,\"check_commandline\":\"'random'\",\"check_source\":\"alexandersmbp2.int.netways.de\",\"check_timeout\":60000,\"environment_id\":\"dd1de01a2760747f4e82bf7fec93477dcb13e09c\",\"execution_time\":1,\"hard_state\":1,\"host_id\":\"9b5bb23ddee640aeb6f2e71be4336655ae3bd036\",\"id\":\"9b5bb23ddee640aeb6f2e71be4336655ae3bd036\",\"in_downtime\":false,\"is_acknowledged\":0,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":true,\"is_reachable\":true,\"last_state_change\":1662388190821,\"last_update\":1662388200298,\"latency\":4,\"next_check\":2000000000000,\"next_update\":2000000010013,\"normalized_performance_data\":\"time=1662388200.298245 value=498 value_1m=448.200000 value_5m=398.400000 uptime=161.088683\",\"output\":\"Hello from alexandersmbp2.int.netways.de. Icinga 2 has been running for 2 minutes and 41 seconds. Version: v2.13.0-425-g60eb41ce9\",\"performance_data\":\"time=1662388200.298245 value=498 value_1m=448.200000 value_5m=398.400000 uptime=161.088683\",\"previous_hard_state\":0,\"previous_soft_state\":0,\"scheduling_source\":\"alexandersmbp2.int.netways.de\",\"severity\":2112,\"soft_state\":1,\"state_type\":1}"
1662388202.253286 [0 127.0.0.1:50579] "HSET" "icinga:checksum:host:state" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036" "{\"checksum\":\"a9e5fe2332862ce9b5d077cc1b6100a443bffda1\"}"
1662388202.253410 [0 127.0.0.1:50579] "ZADD" "icinga:nextupdate:host" "2000000010.013584" "9b5bb23ddee640aeb6f2e71be4336655ae3bd036"
Looks like the IDO had the same issue in the past (#3870) and the fix there was to introduce Checkable::OnNextCheckUpdated
additional to Checkable::OnNextCheckChanged
: b960850ce353aaf16a7f1da574b955817b8fd1bd
So just reusing that signal should also do the job for Icinga DB.
🐘.
@lippserd You should promote me to the Chief Wheel Reinvention Officer.