crowdsec icon indicating copy to clipboard operation
crowdsec copied to clipboard

Add machines heartbeat as a prometheus metric

Open LuminatiHD opened this issue 1 year ago • 7 comments

What would you like to be added?

I think it would be useful for the prometheus exporter to also export a metric concerning the heartbeat of the registered machines. As in, for example, cs_machines_heartbeat_seconds{instance="example.com"} 46 similarly to how the CLI already displays this info.

/kind enhancement

Why is this needed?

It would help with seeing if the infrastructure is still intact when changes happen. For one example, we have a setup where we have to communicate across firewalls. If changes to this firewall happen by which client instances could not connect back to the LAPI, one would not know these effects except if you go looking for it.

LuminatiHD avatar Jan 19 '24 10:01 LuminatiHD

@LuminatiHD: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Jan 19 '24 10:01 github-actions[bot]

@LuminatiHD: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

  • /kind feature
  • /kind enhancement
  • /kind bug
  • /kind packaging
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Jan 19 '24 10:01 github-actions[bot]

Hey 👋🏻

Just to ask more questions, each machine has local Prometheus metrics port which can be scraped by Prometheus. If you setup collecting all these instances, wouldnt monitoring already be covered?

However, I do see a point that this does NOT cover if the instance itself cannot connect back to the main instance.

I just wanted to get more information since the feature request was 5 words.

LaurenceJJones avatar Jan 19 '24 10:01 LaurenceJJones

Hey 👋🏻

Just to ask more questions, each machine has local Prometheus metrics port which can be scraped by Prometheus. If you setup collecting all these instances, wouldnt monitoring already be covered?

However, I do see a point that this does NOT cover if the instance itself cannot connect back to the main instance.

I just wanted to get more information since the feature request was 5 words.

No problem. I updated the comment, is this more helpful?

LuminatiHD avatar Jan 19 '24 21:01 LuminatiHD

/kind enhancement

LuminatiHD avatar Jan 23 '24 08:01 LuminatiHD

Thank you for updating the request with a lot more details.

As stated in your other request enhancement request with the release of v1.6.0 the team has their hands full with other projects.

I have added the "good first issue" tag to indicate pull requests from everyone are welcome to resolve this.

LaurenceJJones avatar Jan 25 '24 11:01 LaurenceJJones

I've just noticed that cs_lapi_machine_requests_total{route="/v1/heartbeat"} already exists. IMO, it's not that nice of a solution, as opposed to a dedicated metric, but I understand if that is enough for closing this issue.

LuminatiHD avatar Jan 30 '24 13:01 LuminatiHD