[BUG] eth0 default route priority define outbound connection
Description
hi i've been reading through all the documentation and i'm struggling on how to define which networks in docker compose become the default gateway
from googling i can find various resources which say that its;
- in the order in which it is defined in the compose file (but this seems not to be true)
- that it is based upon the alphabetic naming of the networks, so A will begin before B etc (this seems to be true for customer docker networks, but it doesn't work similarly when combined with ipvlan networks)
I stumbled across a few different issues here which talked about priority and i thought that i could use that to define which network becomes the default outbound gateway but that doesn't appear to be the case either, i'm not sure what priority does - from reading the bug thread it mentioned it was something in relation to mac addresses.
Steps To Reproduce
networks:
intraNW:
priority: 10
br0:
priority: 30
ipv4_address: 192.168.1.145
a-traefik-public:
priority: 20
take for example this, br0 is an ipvlan network and the other 2 are custom docker networks
my use case scenario is that i want to have as eth0 the br0 interface. the reason for this is because i wish to do split tunnelling on the router and in order to do that i need ipvlan.
it works fine if i remove the other custom networks, but i can't for the life of me seem to be able to get the ipvlan br0 network to become the default eth0 network whilst using docker custom networks alongside it
one possible solution to this seems to be creating a container init script and running that on load, which will change the default route to the desired one but i think this is not a very clean solution compared to a native implementation which docker compose
Compose Version
Docker Compose version v2.29.2
Docker Environment
Client:
Version: 24.0.9
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.29.2
Path: /usr/local/lib/docker/cli-plugins/docker-compose
Server:
Containers: 79
Running: 50
Paused: 0
Stopped: 29
Images: 209
Server Version: 24.0.9
Storage Driver: btrfs
Btrfs:
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.99-Unraid
Operating System: Slackware 15.0 x86_64 (post 15.0 -current)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 78.56GiB
Name: alexandria
ID: xxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: hvrpride
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Anything else?
No response
these are some of the other threads on this
https://github.com/docker/compose/issues/4645 https://github.com/moby/moby/pull/43518 https://github.com/docker/compose/issues/11229 https://github.com/docker/compose/issues/8561
this seems like a longstanding issue
priority indeed defines which network is connected first when creating container with moby engine, but AFAIK the actual engine implementation does not use it to define default gateway, but does some alphanumeric sorting
@akerouanton might better know the status of this in latest engine codebase
@mrpops2ko Could you share you network definitions please?
@mrpops2ko Could you share you network definitions please?
root@alexandria:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
e05c170d5559 a-traefik-public bridge local
32974c0c0378 br0 ipvlan local
f16259b5cce2 bridge bridge local
e09d59b70229 host host local
e75d14eecf63 none null local
root@alexandria:~# docker network inspect e05c170d5559
[
{
"Name": "a-traefik-public",
"Id": "e05c170d55594e375f582aba5d35c06b34d4d86774cd86cf750a39de269faca1",
"Created": "2024-05-15T00:39:20.032951453+01:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"0350ae8ef3d8a2b700ccec71d028725fe3e2684a784dbbd3163bc099e3d879ac": {
xxx
}
},
"Options": {},
"Labels": {}
}
]
root@alexandria:~# docker network inspect 32974c0c0378
[
{
"Name": "br0",
"Id": "32974c0c0378747271ab726429616985d953d9a2bac744eb94cf231a50ec8199",
"Created": "2024-09-14T13:42:25.513278678+01:00",
"Scope": "local",
"Driver": "ipvlan",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "192.168.0.0/20",
"Gateway": "192.168.1.1",
"AuxiliaryAddresses": {
"server": "192.168.1.3"
}
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"096fe31bbbf0d98864d108c1f6adab9cd9800451ef0ff8b76af75df77bed345e": {
xxx
}
},
"Options": {
"parent": "br0"
},
"Labels": {}
}
]
the other networks are all created upon docker compose startup
i tried playing around with the routing from within the container but nothing seems to work
$ ip route del default via 172.18.0.1 dev eth0 && ip route add default via 192.168.0.1 dev eth1
RTNETLINK answers: Operation not permitted
$ sudo ip route del default via 172.18.0.1 dev eth0 && ip route add default via 192.168.0.1 dev eth1
RTNETLINK answers: Operation not permitted
i even included
cap_add:
- NET_ADMIN
and that still didn't allow me to do it, so yeah i'm kind of at a loss on how
I'm having a very similar problem: a bridge network and an ipvlan network are attached to a container, the ipvlan gets set as the default route when I need it to be the bridge network.
Networks are defined as follow:
networks:
kea-10-ipvlan: # Network that binds container to host network interface.
name: kea-10-ipvlan
driver: ipvlan
driver_opts:
parent: ${ETH1} # Host interface that Kea containers will use (eth1)
enable_ipv6: false
ipam:
config:
- subnet: ${SUBNET4} # Subnet for Kea containers
gateway: ${GATEWAY4} # Gateway for Kea subnet
kea-20-backend: # Internal network for communication with database
name: kea-20-backend
access-wg:
driver: bridge
And the for the container:
networks:
access-wg:
kea-10-ipvlan:
ipv4_address: ${DNS_IP4}
The defined order as no impact on the default route, neither does the alphanumerical order. The priority attribute has no impact on the route ordering.
AFAICT this is a docker engine issue. Let me know if you find a way to get the expected result by a combination of docker run.. and docker network connect calls, which compose could replicate
Every time a container is connected to a network, Docker Engine looks at the container's whole set of endpoints and picks one to use as the gateway. (And, the same after a network disconnect.)
The gateway endpoint it picks is the first in this ordering.
I don't think that function's comment is quite right... but, endpoints are sorted by priority, then the dedicated docker_gwbridge network is preferred (for swarm), then non-internal networks, then dual-stack over IPv4-only. Finally, if the networks are equal according to those criteria, they're sorted lexicographically by network name.
That means the order in which networks are connected shouldn't make any difference to the end result:
- Compose's
priorityonly affects the order in which networks are added, the value isn't passed to the engine. - The
epPriorityused in the engine's ordering of endpoints is unrelated and, weirdly, there's no way to configure it via the engine's API. (So, it's completely useless.)
But, in the two examples above - the networks are all equal (user-defined, non-internal, and IPv4-only). So, they should be sorted by network name ... I've tried to repro the problem, but haven't been able to. For me, the selected gateway is always based on the network name - including with dockerd 24.0, I don't think this logic has changed since before that release anyway.
I don't think there's anything compose can do to affect this. If we make the engine's priority field useful, we may need to add way for compose to specify it - but we can come back to that.
@mrpops2ko, please could you re-raise this in https://github.com/moby/moby/issues? (Unfortunately we can't move issues between projects.) Then we can dig in to it some more ... I think I'll need a minimal repro, if possible.
hi @robmry thank you for this detailed explanation, and it is working as you've described it, the problem is that people have many use case scenarios where they don't want it to work as you described.
A > B works and is great but there are scenarios where A < B are required or wanted.
can you explain more about this priority field in docker compose, with a use case scenario? i'm at a loss as to what it actually does if it isn't used to do the thing that a user at first glance would assume it is doing
some parts from reading your comment seem contradictory endpoints are sorted by priority yet i've provided priority and the default gateway doesn't change
maybe thats the delineation? i'm assuming that the first endpoint with the highest priority would result in becoming the eth0 default gateway for internet connectivity, would that be your assumption and understanding too?
that isn't the case as to what is happening and reading through the older bug threads, priority was never seemingly designed to do that https://github.com/docker/compose/issues/11229#issuecomment-1833768309
priority (as seen in https://github.com/moby/moby/blob/c14efb8deeb2cf5a716ab874c707f9a69228926c/libnetwork/sandbox.go#L677-L683) used to sort endpoints is not set by Docker Compose as API. So the result is you get sort applied by name (https://docs.docker.com/reference/api/engine/version/v1.45/#tag/Network/operation/NetworkConnect) does not offer such an option - I wonder this is a swarm-only feature ?
priority is only used in Docker Compose to connect to networks in priority order, but this doesn't have any impact on this algorithm.
A > B works and is great but there are scenarios where A < B are required or wanted.
Hi @mrpops2ko - yes, understood ... only having the network name to determine gateway selection isn't good.
But, network name is influencing gateway selection for you now, including for ipvlan?
In your original description, you said "2. that it is based upon the alphabetic naming of the networks, so A will begin before B etc (this seems to be true for customer docker networks, but it doesn't work similarly when combined with ipvlan networks)".
As far as I can see, the rule should apply equally to ipvlan and bridge networks - and I wasn't able to repro the problem, for me the name of the network always determined the gateway selection, all other things being equal. I'm reluctant to make any changes without understanding this, it'd be easy to break something that's relied on by others.
can you explain more about this
priorityfield in docker compose, with a use case scenario? i'm at a loss as to what it actually does if it isn't used to do the thing that a user at first glance would assume it is doing
As @ndeloof says, it's just the order in which compose connects networks.
For the engine - I think it'll affect interface naming in the containers, they're numbered in order (eth0, eth1 etc). I can't think of anything else at the moment, but could be forgetting something.
some parts from reading your comment seem contradictory
endpoints are sorted by priorityyet i've provided priority and the default gateway doesn't change
Right. As I said, "The epPriority used in the engine's ordering of endpoints is unrelated and, weirdly, there's no way to configure it via the engine's API. (So, it's completely useless.)" ... it doesn't work, don't try to make more sense of it than that!
maybe thats the delineation? i'm assuming that the first endpoint with the highest priority would result in becoming the eth0 default gateway for internet connectivity, would that be your assumption and understanding too?
I tried to describe my understanding in my last comment, and it's not that. But, I agree more control is needed. I don't think we'd want to reuse compose's priority field to set the gateway priority in future ... there might be a reason to connect the gateway endpoint last, or to want interface numbering to be separate from gateway selection.
So, there's some missing plumbing, we need a way to be able to specify gateway priority for the engine. But, if the network name isn't currently having the expected effect, I'd like to understand that.
However, this isn't a compose issue. So, I've raised https://github.com/moby/moby/issues/48868 ... let's continue the discussion over there.
(@ndeloof, please could you close this one?)