Feature Request: Hierarchical Monitor Management with Intelligent Notification Suppression

Open Smallinger opened this issue 6 months ago • 1 comments

Is your feature request related to a problem? Please describe. I'm always frustrated when my Proxmox homelab infrastructure fails and I get bombarded with notifications from every VM and service running on it. When my Proxmox host goes down for maintenance or crashes, I receive dozens of alerts from all the VMs, containers, and services hosted on it, making it impossible to quickly identify that the root cause is just the hypervisor being offline. This notification spam is especially annoying during planned maintenance or when troubleshooting hardware issues.

Describe the solution you'd like would like a hierarchical monitor management system with intelligent notification suppression for homelab environments. The solution should include: A three-tier hierarchy structure (Infrastructure > Platform > Service) where monitors can be organized based on their dependencies. When a parent monitor fails, the system should automatically suppress notifications for all child monitors, ensuring only the root cause generates alerts. When the parent recovers, child notifications should be automatically restored.

Current Behavior (Notification Storm):
Proxmox-Host [DOWN] → Alert sent ✉️
├── pfSense-VM [DOWN] → Alert sent ✉️
│   ├── Home-Website [DOWN] → Alert sent ✉️
│   ├── Nextcloud [DOWN] → Alert sent ✉️
│   └── Plex-Server [DOWN] → Alert sent ✉️
├── Docker-Host-VM [DOWN] → Alert sent ✉️
│   ├── Portainer [DOWN] → Alert sent ✉️
│   ├── Grafana [DOWN] → Alert sent ✉️
│   └── Home-Assistant [DOWN] → Alert sent ✉️
└── NAS-VM [DOWN] → Alert sent ✉️

Result: 10 notifications for 1 root cause (Proxmox maintenance)

Proposed Behavior (Smart Suppression):
Proxmox-Host [DOWN] → Alert sent ✉️
├── pfSense-VM [SUPPRESSED] → No alert 🔕
│   ├── Home-Website [SUPPRESSED] → No alert 🔕
│   ├── Nextcloud [SUPPRESSED] → No alert 🔕
│   └── Plex-Server [SUPPRESSED] → No alert 🔕
├── Docker-Host-VM [SUPPRESSED] → No alert 🔕
│   ├── Portainer [SUPPRESSED] → No alert 🔕
│   ├── Grafana [SUPPRESSED] → No alert 🔕
│   └── Home-Assistant [SUPPRESSED] → No alert 🔕
└── NAS-VM [SUPPRESSED] → No alert 🔕

Result: 1 notification for 1 root cause

The UI should provide both list and hierarchy tree views, with advanced filtering capabilities for hierarchy levels, status, and monitor types. This would be perfect for organizing homelab infrastructure where you have physical hosts, VMs, and containerized services. Additionally, certificate handling should be improved to display expiration information even when TLS validation is disabled (common in homelabs with self-signed certificates), showing remaining days in a user-friendly format like "365d" or "30d (TLS ignored)" with appropriate color warnings.

Describe alternatives you've considered I've tried setting up notification schedules to avoid alerts during maintenance windows, but this requires manual planning and doesn't help with unexpected outages. I've also considered using separate monitoring tools for different infrastructure layers, but this creates more complexity in a homelab environment. Some people suggest just turning off notifications entirely during maintenance, but then you risk missing genuine issues with other independent systems.

Additional context This feature would be incredibly valuable for homelab enthusiasts running Proxmox, ESXi, or other virtualization platforms. Most homelabs follow a natural hierarchy that this system would perfectly represent.

Typical Homelab Hierarchy:

Infrastructure Level (Physical):
├── Proxmox-Host-01
├── Network-Switch
├── UPS-System
└── Internet-Router

Platform Level (VMs/Core Services):
├── pfSense-Firewall (depends on: Proxmox-Host-01)
├── Docker-Host-VM (depends on: Proxmox-Host-01)
├── TrueNAS-VM (depends on: Proxmox-Host-01)
└── Pi-hole-VM (depends on: Proxmox-Host-01)

Service Level (Applications):
├── Nextcloud (depends on: Docker-Host-VM)
├── Plex-Media-Server (depends on: TrueNAS-VM)
├── Home-Assistant (depends on: Docker-Host-VM)
└── Personal-Website (depends on: pfSense-Firewall)

The system should be backward compatible with existing monitors and allow optional hierarchy assignment. This would be especially useful for homelabs where you might have mixed environments (some VMs on Proxmox, some services on bare metal, some in containers). Recovery scenario: When I bring my Proxmox host back online after maintenance, all dependent service notifications would automatically resume, ensuring I don't miss any real issues that might have occurred during the maintenance window. This would make homelab monitoring much more manageable and reduce the notification fatigue that currently makes me want to disable monitoring altogether during maintenance periods.

Jun 04 '25 08:06 Smallinger

Adding this from the wishilist:

I'd like to see if a service is down -> here are the related components to help give context in failure investigation or to stop spam alerts.

Use-case 1: if I have a healthcheck for www.mydomain.local and I know the www service runs on server docker-host-1 and needs a database on db-host-1, I'd like to see at a glance how the components that make up that service are doing.

Use-case 2: If service-A has a dependency on service-B, I'd like to suppress alerts for service-A if service-B is down. This would help with alert fatigue and reduce unwanted noise when something goes down.

Use-case 3: If common-service is down, I'd like to show that service-A, service-B and service-C are impacted in the incident that's generated. This would improve incident impact visibility and help focus on root cause instead of chasing symptoms.

Jul 26 '25 00:07 InputObject2