exporter-toolkit icon indicating copy to clipboard operation
exporter-toolkit copied to clipboard

Add systemd socket listener activation

Open DaAwesomeP opened this issue 2 years ago • 29 comments

I called the function UseActivatedSocketAndServe. Let me know if I should change that. Uses the latest version of the systemd module (v22).

Discussion at node_exporter: https://github.com/prometheus/node_exporter/pull/2393

Per CONTRIBUTING.MD, @SuperQ @roidelapluie

Signed-off-by: Perry Naseck [email protected]

DaAwesomeP avatar Jun 05 '22 16:06 DaAwesomeP

It sounds like we might need to change the tls config in kingpin flags to be a struct and add a flag for this, then pass it to ListenAndServe and do the right thing here.

The flag should be optional as we don't want it e.g. in Prometheus and non-linux systems.

roidelapluie avatar Jun 09 '22 09:06 roidelapluie

It would be nice if this was integrated into the web ListenAndServe method. The idea of this toolkit is to minimize the boilerplate needed in exporters by having opinionated setup.

One thing I was thinking about here is we currently only support one listenAddress. There have been some requests to allow multiple listen addresses to be specified. We could do a bit of a breaking change to support both socket activation and multiple addresses.

SuperQ avatar Jun 09 '22 10:06 SuperQ

I see, so allow for multiple listenAddress and one address is a reserved keyword to enable socket activation?

DaAwesomeP avatar Jun 09 '22 14:06 DaAwesomeP

Systemd can also provide multiple activated sockets (of which I am currently erroring if there are multiple), so that would play very nicely into this.

DaAwesomeP avatar Jun 09 '22 14:06 DaAwesomeP

I took a try at it, but I think I need some guidance with how you would like this to work to support multiple sockets. Please forgive any incorrect assumptions; I am still quite new to Go.

It seems that Serve() can only take one listener, which means this will require multiple Serve() instances (unless there is some other method). Should I return an array of servers instead of a single one? That is definitely a big breaking change.

DaAwesomeP avatar Jun 17 '22 02:06 DaAwesomeP

Ah, ok this will require proper concurrency. I will keep reading and experimenting and hopefully end up at a pull, but this is definitely outside my current skill-level. Anyone who stumbles upon this pull in the meantime should please feel free to contribute it themselves and I can rebase and incorporate the systemd activation part.

DaAwesomeP avatar Jun 17 '22 02:06 DaAwesomeP

Could a simple unix socker approach work too or do we need to be systemd specific?

roidelapluie avatar Jun 22 '22 08:06 roidelapluie

The difficulty seems to be with having multiple HTTP server instances, not the listeners. I am making progress and should hopefully have something to test next week.

DaAwesomeP avatar Jun 22 '22 12:06 DaAwesomeP

OK! This is now working for multiple systemd activated sockets. It uses the existing Serve() function, so it is not a breaking change. It can theoretically support multiple non-systemd listeners as well, but I still need to add that function.

But adding that function doesn't really fit well into the current way exporters use the toolkit. The current way seems to be:

  1. Exporters set up kingpin flags like --web.listen-address
  2. Exporters set up &http.Server{} with the specific address
  3. Exporters call web.ListenAndServ()

My proposal would be to:

  1. Move these flags to this module
  2. Allow multiple instances of --web.listen-address for listening on multiple ports without systemd activation
  3. Provide a function web.ListenAndServAll() (or some other name) that parses these flags and begins serving on all the necessary listeners and ports.

This would be a breaking change, but it would ensure that all exporters start this part the same way and that any future changes or features would not introduce breaking changes to the individual exporter code.

DaAwesomeP avatar Jun 26 '22 18:06 DaAwesomeP

@roidelapluie @SuperQ ping, please provide feedback with what you prefer for implementing the required options.

DaAwesomeP avatar Jul 06 '22 23:07 DaAwesomeP

if we have a ListenAndServAll, then we don't need to break existing setups.

It is acceptable to me to change what https://github.com/prometheus/exporter-toolkit/blob/master/web/kingpinflag/flag.go returns. It can return a pointer to a struct.

roidelapluie avatar Jul 08 '22 12:07 roidelapluie

Great, I will try this out this coming week.

DaAwesomeP avatar Jul 11 '22 03:07 DaAwesomeP

Hello! I have completed the changes. Please let me know if this looks good or if I should make any adjustements.

I have implemented:

  • A struct for exporter-toolkit flags. This removes the need for the exporter implementation to worry about flags like web.listen-address and instead they only need to do something like:
    toolkitFlags = kingpinflag.AddFlags(kingpin.CommandLine)
    
  • ListenAndServe now requires much less boilerplate in the exporter implementation. Exporters no longer need to worry about specifying a port to listen on, etc. It is simply:
    server := &http.Server{}
      if err := web.ListenAndServe(server, toolkitFlags, logger); err != nil {
      	level.Error(logger).Log("err", err)
      	os.Exit(1)
      }
    
    This is a major breaking change. However, this method should be more resilient to breaking changes going forward.
  • ListenAndServe now uses toolkitFlags (FlagStruct) to determine ports to listen on, systemd socket activation, and TLS config file path.
  • Implementing new flags and features are now much less likely to induce breaking changes in exporter implementations, as new flags may be added to FlagStruct without changing the exporter implementation. Help messages for exporter-toolkit flags will now also be unified across all exporters (for flags like web.listen-address).
  • A user may specify multiple --web.listen-address to listen on multiple port listeners.
  • A user may optionally specify --web.systemd-socket to listen with systemd activation instead of port listeners.

I will push a working example of node_exporter to prometheus/node_exporter#2393 shortly.

DaAwesomeP avatar Aug 13 '22 19:08 DaAwesomeP

Requesting Review from @discordianfish, @roidelapluie, or @SuperQ

DaAwesomeP avatar Aug 20 '22 02:08 DaAwesomeP

@SuperQ Updated and pushed minor doc change!

DaAwesomeP avatar Aug 20 '22 14:08 DaAwesomeP

@discordianfish @SuperQ Yup, the any was the issue, thanks! Please re-run CI.

DaAwesomeP avatar Aug 23 '22 01:08 DaAwesomeP

Passed CI!

DaAwesomeP avatar Aug 23 '22 12:08 DaAwesomeP

@roidelapluie Any additional comments?

SuperQ avatar Aug 23 '22 14:08 SuperQ

I just realized one issue. How will this work/react on non-Linux platforms?

I'm wondering if we need to split the implementation so that the systemd flags do not show up on other platforms.

SuperQ avatar Aug 25 '22 08:08 SuperQ

Can we use a struct and not enforce the usage of kingpin to everyone?

roidelapluie avatar Aug 25 '22 10:08 roidelapluie

I just realized one issue. How will this work/react on non-Linux platforms?

While systemd is only available for linux the mechanism itself can still be used on any UNIX like platform. Systemd opens the sockets which are then inherited by the child process. Information about the file descriptors are passed as environment variables which can be read by the service to get e.g. the amount of passed descriptors. See sd_listen_fds

I am not sure about windows though as it uses file handles instead of file descriptors which function a bit differently. However in that case go-systemd/activate simply says that no file descriptors have been passed (i.e., returns nil)

I'm wondering if we need to split the implementation so that the systemd flags do not show up on other platforms.

Maybe it would be even better to make this automatic and always use passed file descriptors if go-systemd/activate find them (i.e., the respective environment variables are set). This is also what e.g. Pythons tornado and gunicorn do. Then we would not even have to worry users about flags which do not work/exist on some platforms. In that case I would simply print a warning if an additional listener address is manually passed even though sockets have been discovered but not perform a hard exit with code 1.

septatrix avatar Aug 25 '22 10:08 septatrix

Can we use a struct and not enforce the usage of kingpin to everyone?

@roidelapluie The struct is defined with simple types as follows:

type FlagStruct struct {
	WebListenAddresses *[]string
	WebSystemdSocket   *bool
	WebConfigFile      *string
}

https://github.com/prometheus/exporter-toolkit/pull/95/files#diff-c470146e12b2067a20ebbe4e1404a905e8fcf83b5a5d5b7f19b3e5c0d3fb5a0fR19-R24

If someone does not want to use kingpin then they can simply populate and pass this struct themselves instead of calling AddFlags(). The function AddFlags() simply creates and returns this struct. The exporter implementation then stores and passes this struct to ListenAndServe(), so there is an opportunity for the exporter to pass whatever they want.

Here is an example of populating the struct without kingpin as seen in the tests:

flags := kingpinflag.FlagStruct{
	WebListenAddresses: &([]string{port}),
	WebSystemdSocket:   OfBool(false),
	WebConfigFile:      OfString("testdata/web_config_users_noTLS.good.yml"),
}
ListenAndServe(server, &flags, testlogger)

https://github.com/prometheus/exporter-toolkit/pull/95/files#diff-9d2a81b4017ee890112cacf0496039bb3da02d159a34cb7b318ceede8ea7f3f2R43-R48

The reason this file/folder that is getting included is named kingpinflag is because it already existed in this repo prior to this PR. The AddFlags() function already depends on kingpin for web.config.file prior to this PR. Let me know if I should rename it to simply flag.

Providing the (optional) AddFlags() increases consistency across exporters using kingpin (which seems to be most of them). For exporters using kingpin, their flag descriptions will all be consistent and when the toolkit adds new features it will not break most exporter implementations in any way. It also means that adding or changing descriptions (i.e. eventually removing [EXPERIMENTAL] from web.config.file) would otherwise take a very long time to trickle down to all of the exporters.

I just realized one issue. How will this work/react on non-Linux platforms?

@SuperQ All that the systemd socket listener finder does initially is search for an environment variable that lists the file descriptors. If the environment variable doesn't exist, then it will simply say no systemd sockets found (which is true) and exit cleanly. This should definitely fail cleanly on macOS, and @septatrix says it will fail cleanly on Windows. I will try this out on Windows and confirm later today.

I do think it would be a good to add a "(Linux only)" to the end of the flag description; I will add this later today.

We could also have the AddFlags() function set this param in the struct to false and not add the flag to kingpin if it detects a non-Linux platform. Exporters not using kingpin will have to implement this themselves, but they would already be casing on each flag themselves and per above it should fail cleanly on Windows and macOS if it is attempted. So I don't think we need to have separate struct types. I will also work on this today.

Maybe it would be even better to make this automatic and always use passed file descriptors if go-systemd/activate find them (i.e., the respective environment variables are set).

@septatrix Good idea! I'm trying to this of a caveat where it would activate unintentionally, but I cannot think of one! If a user did manage to somehow accidentally pass a file descriptor then they are doing something very wrong and it would be right to notify the user.

The only thing that I really don't like is that if you intended to use a socket listener but failed to provide one, then the exporter would either start listening on an unintended port or it would fail to listen on the unintended port and not give you as specific of an error message. I think it would be much better to fail to start outright with a descriptive error message than to cause unintended behavior. Simply always printing "no systemd listener found" every time will confuse users who do not know what this is and makes it seem like a feature they are missing. Other than the frameworks you mentioned, I have never encountered anything else that automatically chooses (probably for this reason).

@SuperQ and others, what do you think?

DaAwesomeP avatar Aug 25 '22 12:08 DaAwesomeP

@septatrix Good idea! I'm trying to this of a caveat where it would activate unintentionally, but I cannot think of one! If a user did manage to somehow accidentally pass a file descriptor then they are doing something very wrong and it would be right to notify the user.

The only thing that I really don't like is that if you intended to use a socket listener but failed to provide one, then the exporter would either start listening on an unintended port or it would fail to listen on the unintended port and not give you as specific of an error message. I think it would be much better to fail to start outright with a descriptive error message than to cause unintended behavior. Simply always printing "no systemd listener found" every time will confuse users who do not know what this is and makes it seem like a feature they are missing. Other than the frameworks you mentioned, I have never encountered anything else that automatically chooses (probably for this reason).

Socket activation is usually used when to process otherwise would not have the capabilities/permissions to bind to a port. So in case no systemd socket is found we just fallback to the default port and if binding fails than we also log, that neither a socket was found nor port binding was successful.

septatrix avatar Sep 14 '22 10:09 septatrix

@roidelapluie I see what you mean now! I moved the struct, please take a look at 888e3dd.

@SuperQ I added "(Linux only)" to the flag (see 379c4b4) and made the flag only appear on Linux platforms (see 8ce2a7b).

@septatrix Since the default port is not necessarily a normally privileged port in this case, falling back would likely unintentionally succeed in binding the port and would cause unintended operation for the user. Let's see what the maintainers prefer.

DaAwesomeP avatar Sep 14 '22 13:09 DaAwesomeP

ping @roidelapluie @SuperQ are we good to go?

DaAwesomeP avatar Sep 20 '22 03:09 DaAwesomeP

Going to close and re-open to try and trigger CI.

SuperQ avatar Sep 20 '22 06:09 SuperQ

@SuperQ Now passing tests! :tada:

DaAwesomeP avatar Sep 21 '22 02:09 DaAwesomeP

Nice, I think this is ready to go. Thanks for all of the work here, I'm really looking forward to having this done.

But I want to give @roidelapluie one more chance to look at it.

SuperQ avatar Sep 21 '22 06:09 SuperQ

@roidelapluie @SuperQ let me know if any further changes are needed! Thanks!

DaAwesomeP avatar Oct 02 '22 23:10 DaAwesomeP

Thanks!

roidelapluie avatar Oct 17 '22 22:10 roidelapluie