Bynar icon indicating copy to clipboard operation
Bynar copied to clipboard

What if a systems fails with panic/dead or partially dead?

Open rjsuresh opened this issue 5 years ago • 0 comments

Since the ByNar is running as binary (agent) in the system, what happens on the following scenario?

  • Kernel panic
  • System rebooted, not up?
  • Someone stopped the agent and not restarted?
  • Partially died due to hardware (memory, cpu, raid...)

When system goes off then the agent goes off as the agent is running on the system which should be healthy to execute the monitoring.

Possible Solution:

  • Client/Server Architecture ?
  • Peer to Peer monitoring (ex. CEPH OSDs)?

Possible issue again on the solution:

  • Client / Server architecture needs administrative overhead, fail over, firewall, DR, certs, LB and redundancy....
  • Peer to Peer - Message broadcasting or streamlined/narrow down approach. Example, A failed system should be monitored only by the neighbors? A system before and after the sequence ?

Just throwing my thoughts so not miss. :)

rjsuresh avatar Mar 05 '19 19:03 rjsuresh