Checkmate icon indicating copy to clipboard operation
Checkmate copied to clipboard

I WISH CHECKMATE HAD...

Open gorkem-bwl opened this issue 6 months ago • 46 comments

This is a ticket to track a wishlist of items you wish Checkmate had.

COMMENT BELOW 👇

Respond with ❤️ to any request you would also like to see.

P.S.: Come say hi 👋 on the Discord

gorkem-bwl avatar Jun 03 '25 15:06 gorkem-bwl

Wishlist

  1. Notifications: Notifications are one of the most important part of any monitoring stack. It doesn't matter how well the actual monitoring works if you can't be properly notified. So I would love to see the following in a proper notification module:
  • Central notification channel management. I shouldn't need to go to every single monitor to change a notification channel. I shouldn't have to make the same update multiple times (i.e. changing a discord webhook to a new discord webhook for example)
  • Pluggable notification channels. By this I mean, the Bluewave team may have Slack, Discord, Telegram, and email included out of the box, some may want an obscure notification channel that they use (NTFY for example). Make it simple to add a new notification channel module without having to modify core code. This allows the system to be adaptable and expandable without having to have the Bluewave team be the choking point. The team can chose to bring in any quality and useful channels the community builds in to the core product if they so desire to.
  • Ability to add multiple notification channels of the same type. I should be able to have multiple discord channels, maybe one for prod, one for non-prod.
  • Ability to send to multiple notifications for the same monitor. Example, disk runs low on serverA I want to notify the support discord channel so that maybe one of the engineers that has a minute can grab it. At the same time, or maybe with a configurable delay, I want to notify the on-call pager via another means (PagerDuty, OpGenie, Grafana On-Call, etc) so they are paged and can act on it.
  • Notification escalations. Notification has been sent, it hasn't been acknowledged and is still alerting XX time later (configurable, minutes, hours, etc) it escalates to a different notification channel.
  • Custom service monitoring. Ability to monitor a service on a server is paramount to a monitoring solution. Sometimes a service doesn't expose a port or endpoint or that endpoint is restricted. So being able to monitor for a process on a server where capture is already running, would be useful.
  1. Other
  • Tags on monitors that allow me to search and filter based on them. This would include in the notifications. For example, I would like to be able to notify a customer when their service is having an issue. So if I could setup the monitor that's monitoring their service with a tag, then on the monitor add a filter for tag = XXX, send to this channel, but also send to this other channel (for a production support team for example). Example of this concept is Grafana Alerting: https://grafana.com/docs/grafana/latest/alerting/configure-notifications/
  • Searchable monitors. Currently with ~150 monitors in Checkmate, I have to browse through all the pages to find the monitor I want to look at.
  • Page size preference remembered sessions (at the user level would be ideal, along with having a global configurable default value). Related to the above, if I change it to 25 per page and then refresh, it defaults back to 5. Super annoying.
  • Custom script execution. Similar to custom service monitoring, being able to create scripts or plugins that run and do something more complex than up/down and reports back a healthy or not makes a monitoring system extremely expandable and adaptable. A good example is in Icinga/Nagios/Checkmk eco-system. The parsing is simplified in that the return code of the plugin has to be 0, 1, 2, 3 for Ok, Warning, Critical, Unknown. Then no need to have to parse all sort of different outputs. https://www.monitoring-plugins.org/doc/guidelines.html#AEN74
  • Websocket and gRPC monitoring. Many of the endpoints I need to monitor are one of the two. Being able to query them and check for successful replies is crucial to a complete monitoring solution.
  • Event handlers. Being able to trigger event handlers based on certain criteria. For example, if a service dies, restart it, however if that service dies more than once in a given time period, send a notification.
  • Ability to export metrics in prometheus style metrics. Monitoring and observability are two different things, so even though I have monitoring in place I'm still going to have an observability platform. Since the main metrics are being gathered by capture/checkmate, it would be nice if at a minimum those could be exposed as prometheus metrics, even better if there was a way to provide a prometheus compatible url and have it do the remote write.

All these are features in other monitoring systems. In all the cases where the ability to use a custom plugin, module, etc is proposed I would find it perfectly acceptable to require the plugin be written in a certain language and have other requirements (for example the return code of the custom script/plugin) to make it more easily to be "plugged-in" to checkmate/capture and to keep the system running efficiently.

calebcall avatar Jun 03 '25 16:06 calebcall

I think that we need the ability to group uptime and infrastructure monitors together, ie) all of the VMs in one section, all of the websites in a different section, external sites in a third section etc.

ljhardy avatar Jun 06 '25 16:06 ljhardy

Ability to have agents in remote sites that push so no port forwarding needed in remote sites as the agents will send a heartbeat so as long as the server is reachable from the agent it will work (a plus if the agent can be used as a probe to reach other devices in that network)

00o-sh avatar Jun 08 '25 01:06 00o-sh

I think that we need the ability to group uptime and infrastructure monitors together, ie) all of the VMs in one section, all of the websites in a different section, external sites in a third section etc.

I think tagging monitors won't work in this case - you want them virtually grouped on the dashboard, right? For me to understand your use-case better, is there an example of a user interface from another application?

gorkem-bwl avatar Jun 08 '25 05:06 gorkem-bwl

It would be fantastic to have the ability to show infrastructure information on the public status pages. (as an aside, the use of the word "server" in the app can be confusing - my assumption of "server" was that it was one of my servers (hardware) and not one of my services that I monitor for uptime and such, so adding "servers" to the status page was confusing to me as the servers (infrastructure in the app) I had added were not available to be added)

I'd love to be able to show my users some basics about the hardware behind the scenes. Storage capacity/usage over time, CPU/RAM usage. Likely not nearly as much data as on a proper private dashboard, but I don't see why not allow the user to pick and choose which components they have being monitored infrastructure-wise and display them publicly.

My personal example would be two on-prem servers and one VPS and giving some high level data to those looking at the status page. Bonus points if the monitored services can be displayed in such a way that they tie back to the relevant server hardware being shown.

formless63 avatar Jun 08 '25 16:06 formless63

I think that we need the ability to group uptime and infrastructure monitors together, ie) all of the VMs in one section, all of the websites in a different section, external sites in a third section etc.

I think tagging monitors won't work in this case - you want them virtually grouped on the dashboard, right? For me to understand your use-case better, is there an example of a user interface from another application?

Uptime Kuma.

Image

ljhardy avatar Jun 08 '25 16:06 ljhardy

I think that we need the ability to group uptime and infrastructure monitors together, ie) all of the VMs in one section, all of the websites in a different section, external sites in a third section etc.

I think tagging monitors won't work in this case - you want them virtually grouped on the dashboard, right? For me to understand your use-case better, is there an example of a user interface from another application?

Uptime Kuma.

Image

Got it. We can tag monitors and also have another option on the dashboard to additionally group them on the dashboard 👍 It'll be a bit tricky using tags and groupings at the same time but we can figure it out :)

gorkem-bwl avatar Jun 08 '25 16:06 gorkem-bwl

I think that we need the ability to group uptime and infrastructure monitors together, ie) all of the VMs in one section, all of the websites in a different section, external sites in a third section etc.

I think tagging monitors won't work in this case - you want them virtually grouped on the dashboard, right? For me to understand your use-case better, is there an example of a user interface from another application?

Uptime Kuma. Image

Got it. We can tag monitors and also have another option on the dashboard to additionally group them on the dashboard 👍 It'll be a bit tricky using tags and groupings at the same time but we can figure it out :)

In the Uptime Kuma example, tags and groupings are separate. In the above the group was set up as "Docker Containers", and each container was tagged separately with "Docker". You can have one, both, or none. (groupings and tags)

ljhardy avatar Jun 08 '25 17:06 ljhardy

In the Uptime Kuma example, tags and groupings are separate. In the above the group was set up as "Docker Containers", and each container was tagged separately with "Docker". You can have one, both, or none. (groupings and tags)

Yes. Having just the tags and no groups will be limiting in that case. Thanks for the heads up!

gorkem-bwl avatar Jun 08 '25 17:06 gorkem-bwl

I wish there was an option to use postgres or another kind of database, dealing with mongo deployments is shit, and its not even open source licenced 😑

Hanibachi avatar Jun 11 '25 11:06 Hanibachi

I wish there was an option to use postgres or another kind of database, dealing with mongo deployments is shit, and its not even open source licenced 😑

Thanks for this. It'll be quite cumbersome to strip away Mongo and bring in another DB. What issue(s) did you have installing Mongo? Maybe we can address them in our docs?

gorkem-bwl avatar Jun 13 '25 01:06 gorkem-bwl

It would be awesome to have a Wake-On-LAN and a ping tool. Then, even better if a CRON-based scheduler for them. That's preventing us from adopting Checkmate.

SaadBazaz avatar Jun 13 '25 05:06 SaadBazaz

It would be awesome to have a Wake-On-LAN and a ping tool. Then, even better if a CRON-based scheduler for them. That's preventing us from adopting Checkmate.

Thanks @SaadBazaz - do you mind giving me more information about how it works, and what your use cases are? As a plus, you can mention similar apps you use for this purpose as well. It will give us a lot of information when we start with this feature. The more detail the more it's helpful :)

Many thanks!

gorkem-bwl avatar Jun 13 '25 05:06 gorkem-bwl

It would be awesome to have a Wake-On-LAN and a ping tool. Then, even better if a CRON-based scheduler for them. That's preventing us from adopting Checkmate.

Thanks @SaadBazaz - do you mind giving me more information about how it works, and what your use cases are? As a plus, you can mention similar apps you use for this purpose as well. It will give us a lot of information when we start with this feature. The more detail the more it's helpful :)

Many thanks!

@gorkem-bwl

Thank you for such a prompt response! Bravo.

Basically when we are managing servers, we often want to have scheduled sleep time and scheduled wake-times. (this is often done in small setups, for power saving). We also want to be able to manually wake / sleep devices.

So if we can have a simple CLI cron runner in Checkmate (i.e., the backend server runs the cron job on its host computer), we can have:

  • Wake on LAN tool
  • Ping tool
  • Literally any custom CLI call which the admin wants to set (maybe they want to sudo systemctl suspend at 5PM everyday, and then Wake on LAN at 9AM everyday)

These can be presented as buttons in the Three Dot Menu, along with a few modals here-and-there for inputs (e.g. Cron scheduler, custom command call, etc)

Our current solution: We run a custom container in docker which makes wake-on-lan calls for us. Previously, we tried to setup a wake on lan Web Client (https://github.com/sameerdhoot/wolweb) but it didn't work so well for us.

SaadBazaz avatar Jun 13 '25 06:06 SaadBazaz

Hello, I would also like to add something, altough I am unsure if this is outside of scope...

It would be really awesome to expand the docker monitoring feature to include docker container updates: Let's say my awesome-container is on an old latest tag, we already have tools like Watchtower for automatically updating containers, but you may not want to auto-update your containers, maybe you only want to get notified when a new update is available and then do the updating yourself. So I propose it would be nice to have docker image update notifications in Checkmate.

Is this inside of scope? Does this make sense? Would be really awesome to see this implemented!

CodeShellDev avatar Jun 14 '25 11:06 CodeShellDev

There is an issue about an announcement panel, I don't know if this would be a complement to that one, or a completely new request... but here it goes:

Within the status page it would be nice if at the end of the page there was a history of incidents, unlike the administrative history this would be visible to the public where administrators could add tags to incidents such as: investigating, working on a fix, resolved and also be able to attach messages explaining how it is being resolved.

Image

Image

NiceATC avatar Jun 16 '25 11:06 NiceATC

  1. Allow Maintenance inputs to be tied to a notification and have an option for them to be displayed on the public status page. If multiple maintenances are input, it should stack them on top of the page. This provides downtime windows to be shown to those checking the status page and through the notifications.
  2. Hide the "Administrator? Login Here" on the bottom left of the status page. Please make it a toggle option for it to show.
  3. Introduce options to allow infrastructure to be added to the status page with basic up/down metrics. You'll earn cool points if you can also show usage stats, but make this toggleable.
  4. Allow a customizable scaling option for the public status page, as bars are a bit big. Custom CSS options would be amazing.

InfraCharm avatar Jun 20 '25 18:06 InfraCharm

@InfraCharm Created an issue for (2).

For (4), is it ok to define a size and based on this size, stretch the bars? Example:

Image

gorkem-bwl avatar Jun 20 '25 20:06 gorkem-bwl

@InfraCharm Created an issue for (2).

For (4), is it ok to define a size and based on this size, stretch the bars? Example:

Image

I can't give you an exact measurement as different clients of mine would have different size status pages.

I think having a custom CSS box like other major status pages would be great and allow for the maximum amount of customizations on the status page itself.

InfraCharm avatar Jun 20 '25 22:06 InfraCharm

Hello, I would also like to add something, altough I am unsure if this is outside of scope...

It would be really awesome to expand the docker monitoring feature to include docker container updates: Let's say my awesome-container is on an old latest tag, we already have tools like Watchtower for automatically updating containers, but you may not want to auto-update your containers, maybe you only want to get notified when a new update is available and then do the updating yourself. So I propose it would be nice to have docker image update notifications in Checkmate.

Is this inside of scope? Does this make sense? Would be really awesome to see this implemented!

Curious why not just run watchtower in monitor mode. Already does exactly what you're asking about.

calebcall avatar Jul 04 '25 19:07 calebcall

@calebcall Yes you would be right, if Watchtower wasn‘t abandoned, I can‘t get simple notifications to work and as a bonus Checkmate would have an interface for said feature.

CodeShellDev avatar Jul 05 '25 20:07 CodeShellDev

How about supporting dedicated, persistent API tokens whith more or less granular permissions? Current workflow would be to login via the API, retrieve the token and use that for further API requests. This is rather bad for scripting/interfacing the API with an automation.

There's the checkmate CLI but this also seems to require username + password the aquire a token in the first place.

My usecase for this is basically having a script/daemon that's reads information from a Netbox and adds/updates/deletes monitors in Checkmate accordingly.

jonasjelonek avatar Jul 16 '25 10:07 jonasjelonek

How about supporting dedicated, persistent API tokens whith more or less granular permissions? Current workflow would be to login via the API, retrieve the token and use that for further API requests. This is rather bad for scripting/interfacing the API with an automation.

There's the checkmate CLI but this also seems to require username + password the aquire a token in the first place.

My usecase for this is basically having a script/daemon that's reads information from a Netbox and adds/updates/deletes monitors in Checkmate accordingly.

Hi @jonasjelonek ,

This is a feature that I'd like to we as well. I'll discuss this with the team but I think it's something we can add to our list.

Thanks for the suggestion!

ajhollid avatar Jul 16 '25 13:07 ajhollid

Thanks for taking this input. Would be great if this lands in Checkmate, but no hurry :)

jonasjelonek avatar Jul 16 '25 14:07 jonasjelonek

It would be great if we could monitor docker containers by name and not just ID - given that the ID changes when containers are taken down for any reason.

This is achievable with Uptime Kuma

wilcochris avatar Jul 20 '25 16:07 wilcochris

It would be great if we could make our app even more accessible for everyone! We’ve already got some accessibility basics covered and we partially meet the WCAG Level A and some Level AA guidelines. but we know there’s still room to grow. Our goal is to fully support both Level A and Level AA standards. We’re aiming for things like smoother keyboard navigation, clearer focus outlines, a handy “skip to content” link, better form labels, alt text for all images and icons, more semantic HTML, ARIA support for custom components, announcing dynamic updates, strong color contrast, accessible tables and lists, and clearer error messages. Some of this is already in place, but we want to make it even better!

shanikauwu1 avatar Jul 23 '25 05:07 shanikauwu1

From deploying it on Kubernetes (and more kubernetes-related enhancements), here's my wishlist:

  • Support creating a "registration token" that could be used to let Capture auto-register to it.
  • Upgrade the current helm chart to something more full-fledged using sub-charts: I've used a combination of bjw-s-labs/app-template, bitnami/redis, bitnami/mongodb to have a more versatile deployment.
  • Upgrade the current helm chart to define a Capture daemonset to monitor the Kubernetes nodes
  • Allow passing variables to assemble the mongodb connection string (DB_HOST, DB_PORT, DB_USER, DB_PASS, etc.) instead of forcing DB_CONNECTION_STRING
  • Support for scanning and discovering ingresses inside the Kubernetes cluster to monitor them automatically.

Not really Kubernetes-related but also interesting:

  • Add compatibility with other agents, such as node-exporter or telegraf.
  • Allow setting dependencies or links between services and infrastructure machines

InputObject2 avatar Jul 24 '25 18:07 InputObject2

Thank you @InputObject2 - can you briefly describe this one and possibly give a use-case?

  • Allow setting dependencies or links between services and infrastructure machines

gorkem-bwl avatar Jul 24 '25 20:07 gorkem-bwl

Thank you @InputObject2 - can you briefly describe this one and possibly give a use-case?

  • Allow setting dependencies or links between services and infrastructure machines

I'd like to see if a service is down -> here are the related components to help give context in failure investigation or to stop spam alerts.

Use-case 1: if I have a healthcheck for www.mydomain.local and I know the www service runs on server docker-host-1 and needs a database on  db-host-1, I'd like to see at a glance how the components that make up that service are doing.

Use-case 2: If service-A has a dependency on service-B, I'd like to suppress alerts for service-A if service-B is down. This would help with alert fatigue and reduce unwanted noise when something goes down.

Use-case 3: If common-service is down, I'd like to show that service-A, service-B and service-C are impacted in the incident that's generated. This would improve incident impact visibility and help focus on root cause instead of chasing symptoms.

InputObject2 avatar Jul 25 '25 13:07 InputObject2

Thank you @InputObject2 - can you briefly describe this one and possibly give a use-case?

  • Allow setting dependencies or links between services and infrastructure machines

I'd like to see if a service is down -> here are the related components to help give context in failure investigation or to stop spam alerts.

Use-case 1: if I have a healthcheck for www.mydomain.local and I know the www service runs on server docker-host-1 and needs a database on  db-host-1, I'd like to see at a glance how the components that make up that service are doing.

Use-case 2: If service-A has a dependency on service-B, I'd like to suppress alerts for service-A if service-B is down. This would help with alert fatigue and reduce unwanted noise when something goes down.

Use-case 3: If common-service is down, I'd like to show that service-A, service-B and service-C are impacted in the incident that's generated. This would improve incident impact visibility and help focus on root cause instead of chasing symptoms.

Those are great suggestions and a great problem to solve. Do you mind creating an issue for this so we can implement it?

gorkem-bwl avatar Jul 25 '25 14:07 gorkem-bwl