Document mapping of pod data structure to systemd service files
We need to define, which pod properties we initially want to support and how we want to map those to our systemd unit files.
I'll just start with my initial ideas in this ticket, if we find out that it becomes too unwieldy we'll move the discussion somewhere else.
Once we are agreed, I'll pull the result out into an ADR.
The following shows fields I've looked at so far and where in the systemd file I'd extract them, but I am sure that list is far from complete.
pod:
metadata:
name: -> name
spec:
containers: <we currently only allow one container per pod>
image: <used by virtual kubelet for downloading package>
env: Environment=...
command: ExecStart
args: ExecStart
name: <unused, taken from meta.name>
volumeMounts: <used by virtual kubelet to set up machine>
workingDir: WorkingDirectory
initcontainers: <we could either implement these as extra services linked by a before= statement, or build ExecStartPre commands from them, not sure which makes more sense>
restartPolicy: Restart
terminationGracePeriodSeconds: TimeoutStopSec <Systemd also offers TimeoutStartSeconds but I could not find a matching pod field, should we reuse this one for both?>
A few assumptions in there that might be worth discussing. We currently restrict pods to only have one container as the idea was to just create multiple pods if you need more. Do we want to change that and allow multiple containers? What would be the benefit - downsides?
For init containers, I am unsure how to treat these, I don't think it is relevant at this stage, but might be worth having a quick look at just to make sure we don't burn any bridges. There's two options, we can implement these as one-shot services that are required before our main service starts. That way systemd should run them once, before trying to start our main service (need to investigate the full implications of this). Alternatively we can create ExecStartPre commands from these fields, which systemd would run once, before starting the main service.
Both have things pro and con I guess.. does anyone have any preference of the top of their heads?
Also, we should probably at least take the user to run this as from the PodSecurityContext, but that opens up an entire can of worms that I am not sure we are ready to deal with just yet.. thoughts?
happy to dump thoughts here:
We currently restrict pods to only have one container Won't work and you need init containers anyway. Multiple containers in a pod share diskspace. You could work around it by having some designated directory on the file system, but this clashes with multiple pods running at the same time and writing to the same directory. You also want
RemainAfterExitto be able to get the status of the unit after is has stoppped, I think otherwise you loose insight into that. You have to wait for the control plane to delete the pod anyway.
. There's two options, we can implement these as one-shot services that are required before our main service starts.
I'm doing the same for init containers, I've got a WIP branch that does something, but didn't really got it working well yet. I think you def. want init containters to work, otherwise how do you inject per-node info into the pod?
. There's two options, we can implement these as one-shot services that are required before our main service starts.
Yes, this is difficult. Systemd also allows usernames where-as in the securitycontext it's uid/gid, I would prefer usernames, my thinking it to take the user from the unit coming with the installed package and making the volume mounts have the uid of that user. Of course a securityContext per container also exist. I think it's best to ignore that for now.
There is also a bunch of environment variables that are set by default.
init containers are just unit files, not much difference there: https://github.com/miekg/vks/pull/98