exporter-toolkit
exporter-toolkit copied to clipboard
Add systemd socket listener activation
I called the function UseActivatedSocketAndServe
. Let me know if I should change that. Uses the latest version of the systemd module (v22
).
Discussion at node_exporter: https://github.com/prometheus/node_exporter/pull/2393
Per CONTRIBUTING.MD, @SuperQ @roidelapluie
Signed-off-by: Perry Naseck [email protected]
It sounds like we might need to change the tls config in kingpin flags to be a struct and add a flag for this, then pass it to ListenAndServe and do the right thing here.
The flag should be optional as we don't want it e.g. in Prometheus and non-linux systems.
It would be nice if this was integrated into the web ListenAndServe
method. The idea of this toolkit is to minimize the boilerplate needed in exporters by having opinionated setup.
One thing I was thinking about here is we currently only support one listenAddress
. There have been some requests to allow multiple listen addresses to be specified. We could do a bit of a breaking change to support both socket activation and multiple addresses.
I see, so allow for multiple listenAddress
and one address is a reserved keyword to enable socket activation?
Systemd can also provide multiple activated sockets (of which I am currently erroring if there are multiple), so that would play very nicely into this.
I took a try at it, but I think I need some guidance with how you would like this to work to support multiple sockets. Please forgive any incorrect assumptions; I am still quite new to Go.
It seems that Serve()
can only take one listener, which means this will require multiple Serve()
instances (unless there is some other method). Should I return an array of servers instead of a single one? That is definitely a big breaking change.
Ah, ok this will require proper concurrency. I will keep reading and experimenting and hopefully end up at a pull, but this is definitely outside my current skill-level. Anyone who stumbles upon this pull in the meantime should please feel free to contribute it themselves and I can rebase and incorporate the systemd activation part.
Could a simple unix socker approach work too or do we need to be systemd specific?
The difficulty seems to be with having multiple HTTP server instances, not the listeners. I am making progress and should hopefully have something to test next week.
OK! This is now working for multiple systemd activated sockets. It uses the existing Serve()
function, so it is not a breaking change. It can theoretically support multiple non-systemd listeners as well, but I still need to add that function.
But adding that function doesn't really fit well into the current way exporters use the toolkit. The current way seems to be:
- Exporters set up kingpin flags like
--web.listen-address
- Exporters set up
&http.Server{}
with the specific address - Exporters call
web.ListenAndServ()
My proposal would be to:
- Move these flags to this module
- Allow multiple instances of
--web.listen-address
for listening on multiple ports without systemd activation - Provide a function
web.ListenAndServAll()
(or some other name) that parses these flags and begins serving on all the necessary listeners and ports.
This would be a breaking change, but it would ensure that all exporters start this part the same way and that any future changes or features would not introduce breaking changes to the individual exporter code.
@roidelapluie @SuperQ ping, please provide feedback with what you prefer for implementing the required options.
if we have a ListenAndServAll, then we don't need to break existing setups.
It is acceptable to me to change what https://github.com/prometheus/exporter-toolkit/blob/master/web/kingpinflag/flag.go returns. It can return a pointer to a struct.
Great, I will try this out this coming week.
Hello! I have completed the changes. Please let me know if this looks good or if I should make any adjustements.
I have implemented:
- A struct for exporter-toolkit flags. This removes the need for the exporter implementation to worry about flags like
web.listen-address
and instead they only need to do something like:toolkitFlags = kingpinflag.AddFlags(kingpin.CommandLine)
-
ListenAndServe
now requires much less boilerplate in the exporter implementation. Exporters no longer need to worry about specifying a port to listen on, etc. It is simply:
This is a major breaking change. However, this method should be more resilient to breaking changes going forward.server := &http.Server{} if err := web.ListenAndServe(server, toolkitFlags, logger); err != nil { level.Error(logger).Log("err", err) os.Exit(1) }
-
ListenAndServe
now usestoolkitFlags
(FlagStruct
) to determine ports to listen on, systemd socket activation, and TLS config file path. - Implementing new flags and features are now much less likely to induce breaking changes in exporter implementations, as new flags may be added to
FlagStruct
without changing the exporter implementation. Help messages for exporter-toolkit flags will now also be unified across all exporters (for flags likeweb.listen-address
). - A user may specify multiple
--web.listen-address
to listen on multiple port listeners. - A user may optionally specify
--web.systemd-socket
to listen with systemd activation instead of port listeners.
I will push a working example of node_exporter
to prometheus/node_exporter#2393 shortly.
Requesting Review from @discordianfish, @roidelapluie, or @SuperQ
@SuperQ Updated and pushed minor doc change!
@discordianfish @SuperQ Yup, the any
was the issue, thanks! Please re-run CI.
Passed CI!
@roidelapluie Any additional comments?
I just realized one issue. How will this work/react on non-Linux platforms?
I'm wondering if we need to split the implementation so that the systemd flags do not show up on other platforms.
Can we use a struct and not enforce the usage of kingpin to everyone?
I just realized one issue. How will this work/react on non-Linux platforms?
While systemd is only available for linux the mechanism itself can still be used on any UNIX like platform. Systemd opens the sockets which are then inherited by the child process. Information about the file descriptors are passed as environment variables which can be read by the service to get e.g. the amount of passed descriptors. See sd_listen_fds
I am not sure about windows though as it uses file handles instead of file descriptors which function a bit differently. However in that case go-systemd/activate
simply says that no file descriptors have been passed (i.e., returns nil
)
I'm wondering if we need to split the implementation so that the systemd flags do not show up on other platforms.
Maybe it would be even better to make this automatic and always use passed file descriptors if go-systemd/activate
find them (i.e., the respective environment variables are set). This is also what e.g. Pythons tornado and gunicorn do. Then we would not even have to worry users about flags which do not work/exist on some platforms. In that case I would simply print a warning if an additional listener address is manually passed even though sockets have been discovered but not perform a hard exit with code 1.
Can we use a struct and not enforce the usage of kingpin to everyone?
@roidelapluie The struct is defined with simple types as follows:
type FlagStruct struct {
WebListenAddresses *[]string
WebSystemdSocket *bool
WebConfigFile *string
}
https://github.com/prometheus/exporter-toolkit/pull/95/files#diff-c470146e12b2067a20ebbe4e1404a905e8fcf83b5a5d5b7f19b3e5c0d3fb5a0fR19-R24
If someone does not want to use kingpin then they can simply populate and pass this struct themselves instead of calling AddFlags()
. The function AddFlags()
simply creates and returns this struct. The exporter implementation then stores and passes this struct to ListenAndServe()
, so there is an opportunity for the exporter to pass whatever they want.
Here is an example of populating the struct without kingpin as seen in the tests:
flags := kingpinflag.FlagStruct{
WebListenAddresses: &([]string{port}),
WebSystemdSocket: OfBool(false),
WebConfigFile: OfString("testdata/web_config_users_noTLS.good.yml"),
}
ListenAndServe(server, &flags, testlogger)
https://github.com/prometheus/exporter-toolkit/pull/95/files#diff-9d2a81b4017ee890112cacf0496039bb3da02d159a34cb7b318ceede8ea7f3f2R43-R48
The reason this file/folder that is getting included is named kingpinflag
is because it already existed in this repo prior to this PR. The AddFlags()
function already depends on kingpin for web.config.file
prior to this PR. Let me know if I should rename it to simply flag
.
Providing the (optional) AddFlags()
increases consistency across exporters using kingpin (which seems to be most of them). For exporters using kingpin, their flag descriptions will all be consistent and when the toolkit adds new features it will not break most exporter implementations in any way. It also means that adding or changing descriptions (i.e. eventually removing [EXPERIMENTAL]
from web.config.file
) would otherwise take a very long time to trickle down to all of the exporters.
I just realized one issue. How will this work/react on non-Linux platforms?
@SuperQ All that the systemd socket listener finder does initially is search for an environment variable that lists the file descriptors. If the environment variable doesn't exist, then it will simply say no systemd sockets found (which is true) and exit cleanly. This should definitely fail cleanly on macOS, and @septatrix says it will fail cleanly on Windows. I will try this out on Windows and confirm later today.
I do think it would be a good to add a "(Linux only)" to the end of the flag description; I will add this later today.
We could also have the AddFlags()
function set this param in the struct to false
and not add the flag to kingpin if it detects a non-Linux platform. Exporters not using kingpin will have to implement this themselves, but they would already be casing on each flag themselves and per above it should fail cleanly on Windows and macOS if it is attempted. So I don't think we need to have separate struct types. I will also work on this today.
Maybe it would be even better to make this automatic and always use passed file descriptors if
go-systemd/activate
find them (i.e., the respective environment variables are set).
@septatrix Good idea! I'm trying to this of a caveat where it would activate unintentionally, but I cannot think of one! If a user did manage to somehow accidentally pass a file descriptor then they are doing something very wrong and it would be right to notify the user.
The only thing that I really don't like is that if you intended to use a socket listener but failed to provide one, then the exporter would either start listening on an unintended port or it would fail to listen on the unintended port and not give you as specific of an error message. I think it would be much better to fail to start outright with a descriptive error message than to cause unintended behavior. Simply always printing "no systemd listener found" every time will confuse users who do not know what this is and makes it seem like a feature they are missing. Other than the frameworks you mentioned, I have never encountered anything else that automatically chooses (probably for this reason).
@SuperQ and others, what do you think?
@septatrix Good idea! I'm trying to this of a caveat where it would activate unintentionally, but I cannot think of one! If a user did manage to somehow accidentally pass a file descriptor then they are doing something very wrong and it would be right to notify the user.
The only thing that I really don't like is that if you intended to use a socket listener but failed to provide one, then the exporter would either start listening on an unintended port or it would fail to listen on the unintended port and not give you as specific of an error message. I think it would be much better to fail to start outright with a descriptive error message than to cause unintended behavior. Simply always printing "no systemd listener found" every time will confuse users who do not know what this is and makes it seem like a feature they are missing. Other than the frameworks you mentioned, I have never encountered anything else that automatically chooses (probably for this reason).
Socket activation is usually used when to process otherwise would not have the capabilities/permissions to bind to a port. So in case no systemd socket is found we just fallback to the default port and if binding fails than we also log, that neither a socket was found nor port binding was successful.
@roidelapluie I see what you mean now! I moved the struct, please take a look at 888e3dd.
@SuperQ I added "(Linux only)" to the flag (see 379c4b4) and made the flag only appear on Linux platforms (see 8ce2a7b).
@septatrix Since the default port is not necessarily a normally privileged port in this case, falling back would likely unintentionally succeed in binding the port and would cause unintended operation for the user. Let's see what the maintainers prefer.
ping @roidelapluie @SuperQ are we good to go?
Going to close and re-open to try and trigger CI.
@SuperQ Now passing tests! :tada:
Nice, I think this is ready to go. Thanks for all of the work here, I'm really looking forward to having this done.
But I want to give @roidelapluie one more chance to look at it.
@roidelapluie @SuperQ let me know if any further changes are needed! Thanks!
Thanks!