nomad-driver-containerd Fix: hostname not populated for /etc/hosts in bridge networks.

Signed-off-by: Shishir Mahajan [email protected]

Jul 22 '21 20:07 shishir-a412ed

@notnoop @tgross

I am trying to address Issue #103 in containerd-driver. This is the approach I am taking based on https://github.com/hashicorp/nomad/pull/10766 (Let me know if I missed anything)

Currently, there is already a check in containerd-driver which doesn't allow the user to run the job in both host network mode (host_network=true) and bridge network mode (network { mode = "bridge" }) [Already exists in current codebase]

If user sets host_network=true in containerd-driver config: I use the host's /etc/hosts and add any extra_hosts to it, and mount that into the container. [This exists in the codebase today]
If the user sets network mode bridge (cfg.NetworkIsolation != nil), I am using this function hostnames.GenerateEtcHostsMount to generate the /etc/hosts mount. However, this is not working as expected for some reason. I did some debugging and looks like cfg.NetworkIsolation.HostConfig is coming as nil. I am not sure why, but this might be causing the /etc/hosts to not get populated correctly. [New Addition]
Default case: User doesn't set anything. containerd-driver builds a default version of /etc/hosts file, add any extra_hosts that user is asking for, and mounts that into the container [This exists in the codebase today]

Any pointers in how to integrate (2) [https://github.com/hashicorp/nomad/pull/10766] in containerd-driver?

Jul 28 '21 19:07 shishir-a412ed

@tgross @notnoop So I did a little more debugging. Looks like with docker driver (where everything is working correctly) with Nomad 1.1.3 which includes this patch (Where you moved the hosts file from task dir to alloc dir so that it can be shared by all the tasks).

The hosts file is created correctly under the alloc directory.

root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# ls -lt
total 12
-rw-r--r-- 1 root   root     346 Jul 29 23:35 hosts
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 redis-task
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 alloc
root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# cat hosts
# this file was generated by Nomad
127.0.0.1 localhost
::1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

# this entry is the IP address and hostname of the allocation
# shared with tasks in the task group's network
<ip_address> 6f3c7ffa7b44

However, if I switch to containerd-driver, the hosts file is not even created.

root@vagrant:/tmp/NomadClient203041423/2b33c69f-c13a-2f0b-4f75-71941425065b# ls
alloc  redis-task

This is the job I am using:

job "redis" {
  datacenters = ["dc1"]

  group "redis-group" {
    network {
      mode = "bridge"
    }
    task "redis-task" {
      driver = "containerd-driver"

      config {
        image    = "redis:alpine"
      }

      resources {
        cpu        = 500
        memory     = 256
        memory_max = 512
      }
    }
  }
}

Jul 29 '21 23:07 shishir-a412ed

Thanks @shishir-a412ed for the ping. I'm investigating the expected network invariants here and will follow up with concrete suggestions (or a proposed changes).

So /etc/hosts should be populated with the hostname for each container. My understanding is that Docker required some custom handling that necessities having Docker create the network using a pause container and joining the task containers with the pause container network namespace. In that setup, all containers will have the same hostname, so we chose to create a single /etc/hosts file and share it with all containers.

Examining containerd behavior, it seems we don't have the same limitation, and each container has its own hostname despite being in the same namespace, and without requiring a pause container. If that works, then we probably should have a new unique /etc/hosts for each task containerd container, that includes the container hostname, the container ID, and bind mount that.

So the question I have is: Does having having containers with different hostnames yet in the same namespace and with same ip address has adverse effects workloads? Let me know if you have any opinions, I'll double check and follow up.

Aug 02 '21 17:08 notnoop

My understanding from when I did the Docker implementation and looked at exec was that the network_manager_linux code will call into the Docker driver to create the network namespace whenever we're in network.mode = "bridge" on Linux. Because that network namespace can be shared across multiple tasks, each of which can use a different task driver, the task drivers can't assume that they "own" the network namespace. A task driver can ignore that network namespace entirely, but then it won't be in the same netns as other tasks.

As for the impact of multiple hostnames within a single network namespace, I suspect where things will get weird is in application metrics, because tasks that ship with metrics libraries will report different container hostnames, none of which will match even though they're all in the same alloc.

(Also, just a heads up @shishir-a412ed that I'm not at HashiCorp anymore, although I'm always happy to take a look at what y'all are up to!)

Aug 02 '21 23:08 tgross

@notnoop Apologies for the delayed response. After thinking a bit more about this, I am still convoluted on what would be the best way to address this?

In the docker driver, a pause container is setup (per allocation/task group) whose /etc/hosts can then be shared by all the tasks in the allocation. Each task in the allocation get the same hostname which is the container ID of the pause container.

As you mentioned this is not a constraint in the containerd task driver, since there is no pause container and each task can have a unique hostname. We can assign the container ID (which will be unique for the container) as the hostname for each task. (Not sure if this will conflict with the IP, since all containers have a common IP)

I think my question was specific to, when I call hostnames.GenerateEtcHostsMount with the TaskDir() it should return me an /etc/hosts mount which I can then bind-mount into the containerd container. However, this was not working so I was wondering if my approach is right.

As in, If the task driver is responsible for creating the /etc/hosts even in the case of bridge network or Nomad will be creating one and supplying it to the task drivers which can then be bind-mounted into the container.

@tgross Sad to see you go. Thank you for all the contributions and quick responses/reviews. Being a maintainer is hard! and you did a phenomenal job. Looking forward to what you create next!

Aug 25 '21 23:08 shishir-a412ed

@shishir-a412ed any updates on this?

Apr 13 '22 12:04 Oloremo

nomad-driver-containerd nomad-driver-containerd copied to clipboard

Fix: hostname not populated for /etc/hosts in bridge networks.

nomad-driver-containerd
nomad-driver-containerd copied to clipboard