nomad-driver-containerd
nomad-driver-containerd copied to clipboard
Fix: hostname not populated for /etc/hosts in bridge networks.
Signed-off-by: Shishir Mahajan [email protected]
@notnoop @tgross
I am trying to address Issue #103 in containerd-driver
. This is the approach I am taking based on https://github.com/hashicorp/nomad/pull/10766 (Let me know if I missed anything)
- Currently, there is already a
check
incontainerd-driver
which doesn't allow the user to run the job in bothhost
network mode (host_network=true
) andbridge
network mode (network { mode = "bridge" }
) [Already exists in current codebase]
-
If user sets
host_network=true
incontainerd-driver
config: I use the host's/etc/hosts
and add anyextra_hosts
to it, and mount that into the container. [This exists in the codebase today] -
If the user sets network mode
bridge
(cfg.NetworkIsolation != nil
), I am using this functionhostnames.GenerateEtcHostsMount
to generate the/etc/hosts
mount. However, this is not working as expected for some reason. I did some debugging and looks likecfg.NetworkIsolation.HostConfig
is coming asnil
. I am not sure why, but this might be causing the/etc/hosts
to not get populated correctly. [New Addition] -
Default case: User doesn't set anything.
containerd-driver
builds a default version of/etc/hosts
file, add anyextra_hosts
that user is asking for, and mounts that into the container [This exists in the codebase today]
Any pointers in how to integrate (2) [https://github.com/hashicorp/nomad/pull/10766] in containerd-driver
?
@tgross @notnoop
So I did a little more debugging. Looks like with docker
driver (where everything is working correctly) with Nomad 1.1.3
which includes this patch
(Where you moved the hosts
file from task
dir to alloc
dir so that it can be shared by all the tasks).
The hosts
file is created correctly under the alloc
directory.
root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# ls -lt
total 12
-rw-r--r-- 1 root root 346 Jul 29 23:35 hosts
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 redis-task
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 alloc
root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# cat hosts
# this file was generated by Nomad
127.0.0.1 localhost
::1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# this entry is the IP address and hostname of the allocation
# shared with tasks in the task group's network
<ip_address> 6f3c7ffa7b44
However, if I switch to containerd-driver
, the hosts
file is not even created.
root@vagrant:/tmp/NomadClient203041423/2b33c69f-c13a-2f0b-4f75-71941425065b# ls
alloc redis-task
This is the job I am using:
job "redis" {
datacenters = ["dc1"]
group "redis-group" {
network {
mode = "bridge"
}
task "redis-task" {
driver = "containerd-driver"
config {
image = "redis:alpine"
}
resources {
cpu = 500
memory = 256
memory_max = 512
}
}
}
}
Thanks @shishir-a412ed for the ping. I'm investigating the expected network invariants here and will follow up with concrete suggestions (or a proposed changes).
So /etc/hosts should be populated with the hostname for each container. My understanding is that Docker required some custom handling that necessities having Docker create the network using a pause container and joining the task containers with the pause container network namespace. In that setup, all containers will have the same hostname, so we chose to create a single /etc/hosts
file and share it with all containers.
Examining containerd behavior, it seems we don't have the same limitation, and each container has its own hostname despite being in the same namespace, and without requiring a pause container. If that works, then we probably should have a new unique /etc/hosts
for each task containerd container, that includes the container hostname, the container ID, and bind mount that.
So the question I have is: Does having having containers with different hostnames yet in the same namespace and with same ip address has adverse effects workloads? Let me know if you have any opinions, I'll double check and follow up.
My understanding from when I did the Docker implementation and looked at exec
was that the network_manager_linux
code will call into the Docker driver to create the network namespace whenever we're in network.mode = "bridge"
on Linux. Because that network namespace can be shared across multiple tasks, each of which can use a different task driver, the task drivers can't assume that they "own" the network namespace. A task driver can ignore that network namespace entirely, but then it won't be in the same netns as other tasks.
As for the impact of multiple hostnames within a single network namespace, I suspect where things will get weird is in application metrics, because tasks that ship with metrics libraries will report different container hostnames, none of which will match even though they're all in the same alloc.
(Also, just a heads up @shishir-a412ed that I'm not at HashiCorp anymore, although I'm always happy to take a look at what y'all are up to!)
@notnoop Apologies for the delayed response. After thinking a bit more about this, I am still convoluted on what would be the best way to address this?
In the docker driver, a pause
container is setup (per allocation/task group) whose /etc/hosts
can then be shared by all the tasks in the allocation. Each task in the allocation get the same hostname which is the container ID of the pause
container.
As you mentioned this is not a constraint in the containerd
task driver, since there is no pause
container and each task can have a unique hostname. We can assign the container ID (which will be unique for the container) as the hostname for each task.
(Not sure if this will conflict with the IP, since all containers have a common IP)
I think my question was specific to, when I call hostnames.GenerateEtcHostsMount
with the TaskDir()
it should return me an /etc/hosts
mount which I can then bind-mount into the containerd
container. However, this was not working so I was wondering if my approach is right.
As in, If the task driver is responsible for creating the /etc/hosts
even in the case of bridge
network or Nomad will be creating one and supplying it to the task drivers which can then be bind-mounted into the container.
@tgross Sad to see you go. Thank you for all the contributions and quick responses/reviews. Being a maintainer is hard! and you did a phenomenal job. Looking forward to what you create next!
@shishir-a412ed any updates on this?