node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Add systemd socket listener activation

Open DaAwesomeP opened this issue 2 years ago • 12 comments

This adds the ability for systemd to open the port listener and activate the exporter. This provides a few advantages:

  • Systemd can keep the port open even if the exporter is not running (prevent other processes from grabbing it).
  • Systemd can open the port at boot before any user process has a chance to (since 9100 is not a privileged port).
  • Systemd can allow the exporter to use a privileged port without annoying/additional setup or requiring the exporter to run as root. This is very useful if your corporate or application firewall only allows for common ports like 80 or etc. or if you do not want to use unprivileged ports.
  • Systemd will start (a.k.a. activate) the exporter service on first connection to the listening port if it is not already running (assuming the socket is enabled with systemctl enable node_exporter.socket).
  • If the exporter stops or fails+restarts, systemd will queue TCP requests (by default not too many), attempt to start the exporter, and then hand it the requests. I tested this and it works seamlessly (especially since the exporter starts so quickly).
  • I believe you can use the socket unit file to change where it is listening without having to restart the exporter, but this isn't really a useful feature here.

Since you probably don't want this to happen unexpectedly, I added a command line flag to enable it. When enabled it does not fallback to Go opening the port because that is probably unintentional and you may end up with systemd and Go fighting to open the port.

Per CONTRIBUTING.MD it seems I am supposed to @SuperQ @discordianfish

Signed-off-by: Perry Naseck [email protected]

DaAwesomeP avatar Jun 04 '22 18:06 DaAwesomeP

This should probably be a feature of the exporter toolkit if we decide we want it

roidelapluie avatar Jun 04 '22 19:06 roidelapluie

This has been proposed in the past and we have not accepted it.

SuperQ avatar Jun 04 '22 20:06 SuperQ

Ah, I just found #1622. I suppose there is this note:

If somebody sends a PR that is so trivial that there is no expected ongoing support we might consider it, but for now I'm closing this.

Hopefully this PR is small enough for that. I can definitely remove the option and socket unit from the systemd example if that makes it more trivial.

I definitely agree that your monitor should just always be running so the activation is not useful in that regard. It is useful if it fails/restarts, but that is less common and means there is a bigger problem to solve somewhere.

The main thing this solves for our system of machines is that it allows us to use privileged ports without running the exporter as root or ensures that nothing can steal the unprivileged port. The unprivileged port reserved by systemd is definitely preferred since then we don't have to stray from default.

We maintain user-facing desktop and shell machines (think computer labs and remote Linux machines for the purpose of using a shell, doing programming homework, performing research, or running a long task on a long-lived more powerful machine). Users are authenticated (and of course if something happened we would come after them), but we cannot assume that unprivileged or non-systemd-reserved ports are entirely safe or available. A user could DDoS or DoS or discover an exploit in an exporter and cause the exporter to restart or fail. During this time they could grab the unprivileged port and either run a playback or their own fake exporter or simply keep it from running to hide what they are doing or how mant resources they are using on the machine.

In general on accessible machines like this it is not good practice to run critical system processes on unprivileged or unreserved ports because there is no guarantee of high availability. At the application level (the exporter), simply starting it at boot so it can't be taken by another process is not a good enough security guarantee for shared machines. Sure, we want to prevent the application from ever failing, but it can still happen because software has bugs and in that situation there should not be any risk. Systemd is able to reserve the port as a part system initialization and then keep that port, which is a guarantee for high availability on a system where we treat systemd as the last line of defense.

This is something we have to worry about because we need to monitor resource usage to ensure nobody is taking advantage of the machines. It's a much better policy for us to not outright block resource-intensive processes and instead to raise an alert and check in on them when someone complains or something has suspiciously been running for a long time. This way we do not inhibit legitimate research or tasks. These are shell machines, and they are designed to be open, but high availability for system processes is definitely a security requirement in this scenario.

I know most use-cases for the Linux exporter are on server machines where there is rigid control of what is running and you can containerize or chroot or nspawn whatever is not assumed to be controlled, but this is not one of those scenarios. This particular scenario of providing computer lab machines is also definitely not that niche, especially at similar organizations. We are attempting to transition to Prometheus from an older monitoring system that communicates in the other direction (where the monitored machine connects to monitor server).

DaAwesomeP avatar Jun 05 '22 04:06 DaAwesomeP

You don't need to run the exporter as root to gain privileged ports. A normal setcap CAP_NET_BIND_SERVICE on the node_epxorter binary should be sufficient. But running on the default port with a systemd socket reservation is

From another security perspective, you might want to use the setcap and sysctl net.ipv4.ip_local_reserved_ports.

I think the change is simple enough, but I agree with @roidelapluie, we should do this in the exporter-toolkit web package. Would you mind opening a PR there?

SuperQ avatar Jun 05 '22 05:06 SuperQ

Also note the new Go module update in https://github.com/prometheus/node_exporter/pull/2368.

SuperQ avatar Jun 05 '22 09:06 SuperQ

Created pull at https://github.com/prometheus/exporter-toolkit/pull/95.

Also rebased and updated this one to use v22 and verified activation and checks still function.

DaAwesomeP avatar Jun 05 '22 16:06 DaAwesomeP

I deployed this to a small fleet and it works perfectly. The main advantages for us should this get merged is being able to use a privileged port without requiring ANY OF root uid, linux capabilities or sysctl modifications to the port settings in net.ipv{4,6}. Additionally apart from the early port binding which OP already mentioned this also reduces initial load spikes directly after boot which can be vital when spawning new VMs to reduce startup time of other processes. Adding this to the exporter-toolkit also seems like a great idea!

septatrix avatar Jul 06 '22 23:07 septatrix

@septatrix Wow, thank you for the feedback! I was not yet able to deploy this across our systems to test so completely.

The same method for socket activation is implemented in the pending exporter-toolkit pull request.

DaAwesomeP avatar Jul 06 '22 23:07 DaAwesomeP

@SuperQ wdyt nowadays? I'd be fine merging this.

discordianfish avatar Jul 25 '22 17:07 discordianfish

@discordianfish I am still working on https://github.com/prometheus/exporter-toolkit/pull/95, which will more broadly apply to all exporters and provide this functionality to all exporters using the toolkit. When that is merged I will update this pull to make use of it.

DaAwesomeP avatar Jul 25 '22 17:07 DaAwesomeP

This pull has been updated to contain an example for new changes to prometheus/exporter-toolkit#95 (fde4bf0). Actual code is functional, but dependencies are obviously broken until the exporter-toolkit changes are merged/released.

DaAwesomeP avatar Aug 13 '22 19:08 DaAwesomeP

Nice, I like it. Hopefully we'll get the toolkit updated and merged soon.

SuperQ avatar Aug 24 '22 16:08 SuperQ

@SuperQ passing CI and ready for review!

DaAwesomeP avatar Oct 18 '22 01:10 DaAwesomeP

Fixed dependency conflict

DaAwesomeP avatar Oct 19 '22 00:10 DaAwesomeP

@SuperQ Updated exporter-toolkit to v0.8.0!

DaAwesomeP avatar Oct 21 '22 04:10 DaAwesomeP

Forced pushed because forgot DCO in my excitement!

DaAwesomeP avatar Oct 21 '22 04:10 DaAwesomeP

Bumped exporter-toolkit to v0.8.1!

DaAwesomeP avatar Oct 22 '22 16:10 DaAwesomeP