Random networking failures in bridged network
Description
[
{
"Name": "antiraid_internal",
"Id": "3237097f63b8ff00d093454b9a01c17745baf6ddafb60e0423bb68c3c9710b02",
"Created": "2025-02-24T13:17:37.76494543-05:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv4": true,
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": true,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"11c78f924df8efc19549e41c0a7eddcf353b31d26c305a6549841c8553db1d86": {
"Name": "jobserver",
"EndpointID": "3084a13d5fdc3b9d97a975d4c562f7fb7b43621792db9506aeef8df6bad2c025",
"MacAddress": "12:f2:2d:97:c1:f7",
"IPv4Address": "172.18.0.7/16",
"IPv6Address": ""
},
"2b1a17c5693828c21c83e0f3e4473e95c3de7df8b49b506e6eb1e055a244eebb": {
"Name": "sandwich",
"EndpointID": "932810504b316a9f6276bec7fd3e44148a3441b7e6592e3efccf1f2dd966813f",
"MacAddress": "86:d8:b9:0a:70:5f",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
},
"39bbc756d6f8fd5bc21458713e6f40037487e92e67f8922225728347fda1f4b9": {
"Name": "nirn_proxy",
"EndpointID": "cb0a5043ce6c40ad3d4f07115b14911e668c4c1d5c95b9125a5893dd24bdf1cb",
"MacAddress": "be:72:84:69:aa:b9",
"IPv4Address": "172.18.0.5/16",
"IPv6Address": ""
},
"3af5bcfc0dfcfd11acd4d3cf1f5c29b3b6124afda602dbc3675af9743fa48e39": {
"Name": "template-worker",
"EndpointID": "fe2f6ec05ba5aebff77ef6789abab154b841e11ec3402b281cc378d91402c40a",
"MacAddress": "0e:69:8b:fb:c0:fe",
"IPv4Address": "172.18.0.6/16",
"IPv6Address": ""
},
"a7eab9b2041f510745a5526a74d1fe76ffef5478d23026116515826c77d11591": {
"Name": "api_exposer",
"EndpointID": "5ad2a95665fe291a1bd33c15c9642b7e66cc731c86aab5bd1d07fa647986b484",
"MacAddress": "4a:79:1a:f5:8f:89",
"IPv4Address": "172.18.0.10/16",
"IPv6Address": ""
},
"ac412590d63e42c17907fea26a8bcce2853b4693878814ffb2419e00cc46f1a8": {
"Name": "postgres",
"EndpointID": "b28e66d36bf681a260d5816ac00a7265c9d7b9db9d6c41ebfb19c9cff5d01920",
"MacAddress": "1a:c4:eb:be:0c:2d",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
},
"ba1eebf648c0700e5ff34877ec8c0ba004f777db4c24b537b9c8ccb7d2d004f1": {
"Name": "bot",
"EndpointID": "e269c3d2e748c481fda0a27b175e130504a02cf2ad52599f3a5761a5f812b732",
"MacAddress": "02:56:dd:05:96:29",
"IPv4Address": "172.18.0.8/16",
"IPv6Address": ""
},
"c8df3812331015a199f46209b40ddf2d57b7150d2cf600e2c90031a00a9e4ecb": {
"Name": "seaweed",
"EndpointID": "6228db70902b0d0bb59204ee4a58f868ad695b25412e68108d302925e9b0f9bb",
"MacAddress": "36:c6:b9:22:ac:97",
"IPv4Address": "172.18.0.4/16",
"IPv6Address": ""
},
"ddea36d3a1a96a9e35441db1b475f11913aec3f82b014c173873370d46c1a638": {
"Name": "api",
"EndpointID": "61647cad073cb5bb903e8882cc1d97862624b33acd79ebb419f4031ca3c7189e",
"MacAddress": "9a:4c:ae:8c:7f:b3",
"IPv4Address": "172.18.0.9/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.config-hash": "fae475f830a3a5a91e5d11513ea16825a3f95dfce5c3992c00c24dd71cd4be50",
"com.docker.compose.network": "antiraid_internal",
"com.docker.compose.project": "staging",
"com.docker.compose.version": "2.33.0"
}
}
]
frostpaw@REDACTED:~/staging/services$ docker exec api curl http://172.18.0.4:8333
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 172.18.0.4 port 80 after 0 ms: Couldn't connect to server
exit status 7
frostpaw@REDACTED:~/staging/services$ docker exec api curl http://172.18.0.4:8333
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 172.18.0.4 port 80 after 0 ms: Couldn't connect to server
exit status 7
frostpaw@REDACTED:~/staging/services$
frostpaw@REDACTED:~/staging/services$ docker exec api curl http://seaweed:8333
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to seaweed port 8333 after 0 ms: Couldn't connect to server
exit status 7
The last two curl commands randomly either fail or succeed randomly
The below docker compose was used:
services:
# Nirn Proxy
nirn_proxy:
container_name: nirn_proxy
networks:
- antiraid_infra
- antiraid_internal # Exposed to internal services
depends_on:
sandwich:
condition: service_healthy
build:
context: ./infra/nirn-proxy
dockerfile: ./Dockerfile
volumes:
- ./infra/nirn-proxy/secrets.docker.json:/secrets.json:ro
command: cache-endpoints=false port=3221 ratelimit-over-408 endpoint-rewrite=/api/gateway/bot@http://sandwich:29334/antiraid,/api/v*/gateway/bot@http://sandwich:29334/antiraid token-map-file=secrets.json
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:3221/nirn/healthz" ]
interval: 3s
timeout: 10s
retries: 3
ports:
- "3222:3221" # And exposed to host for testing/auditing
labels:
type: "discord-rest"
# Sandwich
sandwich:
container_name: sandwich
networks:
- antiraid_infra
- antiraid_internal # Exposed to internal services
build:
context: ./infra/Sandwich-Daemon
dockerfile: ./Dockerfile
volumes:
- ./infra/Sandwich-Daemon/sandwich.docker.yaml:/sandwich.yaml:ro
command: ./app/sandwich -configurationPath=/sandwich.yaml -prometheusAddress :3931 -httpEnabled --httpHost 0.0.0.0:29334 -level debug
environment:
EXTERNAL_GATEWAY_ADDRESS: ws://sandwich:3600
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:29334/antiraid/api/v9/gateway/bot" ]
interval: 10s
timeout: 10s
retries: 1000000000
ports:
- "29334:29334" # Exposed to host for testing/auditing
- "3931:3931" # Prometheus metrics
- "3600:3600" # Websocket connection for the bot
labels:
type: "discord-gateway"
# Postgres database
postgres:
container_name: postgres
networks:
- antiraid_internal # Exposed to internal services
build:
context: ./data/docker/postgres
dockerfile: ./Dockerfile
# Expose /seed.iblcli-seed
volumes:
- ./data/seed.iblcli-seed:/seed.iblcli-seed:ro
- ./data/state/postgres-other:/var/lib/postgresql
- ./data/state/postgres:/var/lib/postgresql/data
healthcheck:
test: [ "CMD", "pg_isready", "-U", "antiraid" ]
interval: 3s
timeout: 10s
retries: 3
labels:
type: "database"
# Seaweed needs a postgres database to function
seaweed_postgres:
container_name: seaweed_postgres
networks:
- antiraid_seaweed # Exposed to SeaweedFS services
build:
context: ./data/docker/seaweed-postgres
dockerfile: ./Dockerfile
volumes:
- ./data/state/seaweed_postgres-other:/var/lib/postgresql
- ./data/state/seaweed_postgres:/var/lib/postgresql/data
healthcheck:
test: [ "CMD", "pg_isready", "-U", "seaweed" ]
interval: 3s
timeout: 10s
retries: 3
# Seaweed FS itself
seaweed:
container_name: seaweed
build:
context: ./data/docker/seaweed
dockerfile: ./Dockerfile
networks:
- antiraid_internal # Exposed to internal services
- antiraid_seaweed # Exposed to SeaweedFS services
depends_on:
seaweed_postgres:
condition: service_healthy
command: server -filer -s3 -volume.max=100 -master.port=9333 -volume.port=9334 -master.volumeSizeLimitMB=4096 -filer.encryptVolumeData
volumes:
- ./data/state/seaweed:/data
- ./data/state/seaweed-config:/etc/seaweedfs
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:9333/cluster/status" ]
interval: 10s
timeout: 10s
retries: 30
# Bot process itself
bot:
container_name: bot
networks:
- antiraid_internal # Exposed to internal services
depends_on:
nirn_proxy:
condition: service_healthy
sandwich:
condition: service_healthy
postgres:
condition: service_healthy
template-worker:
condition: service_healthy
build:
context: ./services/bot
dockerfile: ./Dockerfile
volumes:
- ./config.docker.yaml:/app/config.yaml:ro
healthcheck:
test: [ "CMD", "curl", "-f", "http://127.0.0.1:20000/state" ]
interval: 3s
timeout: 10s
retries: 10
labels:
type: "service"
# Template worker process
# The template worker is required for the bot to function as it handles Luau
# scripting and permissions etc.
template-worker:
container_name: template-worker
networks:
- antiraid_internal # Exposed to internal services
depends_on:
nirn_proxy:
condition: service_healthy
sandwich:
condition: service_healthy
postgres:
condition: service_healthy
build:
context: ./services/template-worker
dockerfile: ./Dockerfile
volumes:
- ./config.docker.yaml:/app/config.yaml:ro
environment:
RUST_LOG: "template-worker=info"
healthcheck:
test: [ "CMD", "curl", "-X", "POST", "-f", "http://localhost:60000/healthcheck" ]
interval: 3s
timeout: 10s
retries: 100
labels:
type: "service"
# Redis
api_redis:
container_name: api_redis
networks:
- antiraid_api # Only communicates with API
image: redis:7.2.7 # Use 7.2.7 to avoid licensing change with 7.4+
expose:
- 6379
command: redis-server --save 900 1 --save 300 10 --save 60 10000 --loglevel notice --bind 0.0.0.0
volumes:
- ./data/state/redis:/data
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
interval: 3s
timeout: 10s
retries: 3
labels:
type: "cache"
# Jobserver process
jobserver:
container_name: jobserver
networks:
- antiraid_internal # Exposed to internal services
- antiraid_jobserver # Needs external net access too
depends_on:
nirn_proxy:
condition: service_healthy
sandwich:
condition: service_healthy
postgres:
condition: service_healthy
build:
context: ./services/jobserver
dockerfile: ./Dockerfile
volumes:
- ./config.docker.yaml:/app/config.yaml:ro
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:30000" ]
interval: 3s
timeout: 10s
retries: 10
labels:
type: "service"
# API process
# Like all other processes, the API process does not have external net access
# outside of Nirn proxy, however, the api_exposer Nginx proxy is used to bridge
# the gap between the API and the outside world.
api:
container_name: api
networks:
- antiraid_internal # Doesn't need external net access
- antiraid_api
depends_on:
api_redis:
condition: service_started
nirn_proxy:
condition: service_started
postgres:
condition: service_healthy
bot:
condition: service_healthy
build:
context: ./services/api
dockerfile: ./Dockerfile
volumes:
- ./config.docker.yaml:/app/config.yaml:ro
labels:
type: "service"
special-networking-used: "api" # API has special networking needs, so we document that as a nice label
# Nginx process to expose the API outwards while ensuring API itself doesn't get any
# external net access. Bridges external and internal layers
api_exposer:
container_name: api_exposer
networks:
- antiraid_api
- antiraid_internal # Exposed to internal services
- antiraid_infra # Needs access to the outside world
depends_on:
api:
condition: service_started
seaweed:
condition: service_healthy
image: nginx:1.27.4-alpine
volumes:
- ./data/docker/nginx.conf:/etc/nginx/conf.d/default.conf:ro
ports:
- "5600:5600" # Expose API to host
- "5601:5601" # Expose SeaweedFS to host
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:5600/docs/splashtail" ]
interval: 3s
timeout: 10s
retries: 3
networks:
antiraid_infra:
name: antiraid_infra
driver: bridge
internal: false # Infra needs external access
antiraid_internal:
name: antiraid_internal
driver: bridge
internal: true # No external net access
antiraid_api:
# Used for internal API services (Redis/Nginx etc.)
name: antiraid_api
driver: bridge
internal: true # No external net access
antiraid_seaweed:
# Used for internal SeaweedFS services
name: antiraid_seaweed
driver: bridge
internal: true # No external net access
antiraid_jobserver:
# Used for internal Jobserver services
name: antiraid_jobserver
driver: bridge
internal: false # Needs external net access
Reproduce
- docker compose up
- Try the above curl commands
- It randomly succeeds on a docker compose up run or randomly fails
Expected behavior
The above curl commands should either always succeed or always fail if the network config above is wrong (which I doubt but yeah)
docker version
Client: Docker Engine - Community
Version: 28.0.0
API version: 1.48
Go version: go1.23.6
Git commit: f9ced58
Built: Wed Feb 19 22:11:04 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.0.0
API version: 1.48 (minimum version 1.24)
Go version: go1.23.6
Git commit: af898ab
Built: Wed Feb 19 22:11:04 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.25
GitCommit: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc:
Version: 1.2.4
GitCommit: v1.2.4-0-g6c52b3f
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 28.0.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.21.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.33.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 11
Running: 11
Paused: 0
Stopped: 0
Images: 84
Server Version: 28.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc version: v1.2.4-0-g6c52b3f
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.15.167.4-microsoft-standard-WSL2
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 20
Total Memory: 7.592GiB
Name: REDACTED
ID: 5893e8ad-7ea5-4940-a268-0cd7dff1f4f0
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
Additional Info
This is running in a WSL environment:
frostpaw@REDACTED:~/staging/services$ uname -a
Linux REDACTED 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
All REDACTED's are my computers hostname and are omitted for privacy reasons
Hi @cheesycod - this is very likely to be fixed by https://github.com/moby/moby/pull/49518, there's an issue in 28.0.0 with rules getting out of order when there are extra rules in the iptables filter-FORWARD chain.
If you want to send the output of iptables -nvL from when it's broken, I can double-check.
The fixes should be available in a 28.0.1 release in the next couple of days.
Hi @cheesycod - this is very likely to be fixed by #49518, there's an issue in 28.0.0 with rules getting out of order when there are extra rules in the iptables filter-FORWARD chain.
If you want to send the output of
iptables -nvLfrom when it's broken, I can double-check.The fixes should be available in a 28.0.1 release in the next couple of days.
I have already downgraded to 27.5.1 but if it happens again in 28.0.1 or 27.5.1, I’ll update you!
[this is my school/uni account]
Hi @cheesycod - this is very likely to be fixed by #49518, there's an issue in 28.0.0 with rules getting out of order when there are extra rules in the iptables filter-FORWARD chain.
If you want to send the output of
iptables -nvLfrom when it's broken, I can double-check.The fixes should be available in a 28.0.1 release in the next couple of days.
This persists in 27.5.1 as well
iptables -nvL output:
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
5801 2899K DOCKER-USER 0 -- * * 0.0.0.0/0 0.0.0.0/0
5801 2899K DOCKER-ISOLATION-STAGE-1 0 -- * * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER 0 -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
41 48479 ACCEPT 0 -- * br-e97df219409b 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER 0 -- * br-e97df219409b 0.0.0.0/0 0.0.0.0/0
27 3732 ACCEPT 0 -- br-e97df219409b !br-e97df219409b 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- br-e97df219409b br-e97df219409b 0.0.0.0/0 0.0.0.0/0
70 7919 ACCEPT 0 -- br-852e6920e44f br-852e6920e44f 0.0.0.0/0 0.0.0.0/0
710 485K ACCEPT 0 -- * br-7828a5046782 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
1 60 DOCKER 0 -- * br-7828a5046782 0.0.0.0/0 0.0.0.0/0
632 171K ACCEPT 0 -- br-7828a5046782 !br-7828a5046782 0.0.0.0/0 0.0.0.0/0
1 60 ACCEPT 0 -- br-7828a5046782 br-7828a5046782 0.0.0.0/0 0.0.0.0/0
558 226K ACCEPT 0 -- br-3d5e05d68993 br-3d5e05d68993 0.0.0.0/0 0.0.0.0/0
3762 1955K ACCEPT 0 -- br-3237097f63b8 br-3237097f63b8 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- br-f90df49ea7de br-f90df49ea7de 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain DOCKER (3 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.2 tcp dpt:3600
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.2 tcp dpt:3931
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.2 tcp dpt:29334
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.3 tcp dpt:3221
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.4 tcp dpt:5600
0 0 ACCEPT 6 -- !br-7828a5046782 br-7828a5046782 0.0.0.0/0 172.21.0.4 tcp dpt:5601
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
pkts bytes target prot opt in out source destination
0 0 DOCKER-ISOLATION-STAGE-2 0 -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
27 3732 DOCKER-ISOLATION-STAGE-2 0 -- br-e97df219409b !br-e97df219409b 0.0.0.0/0 0.0.0.0/0
0 0 DROP 0 -- * br-852e6920e44f !172.22.0.0/16 0.0.0.0/0
0 0 DROP 0 -- br-852e6920e44f * 0.0.0.0/0 !172.22.0.0/16
632 171K DOCKER-ISOLATION-STAGE-2 0 -- br-7828a5046782 !br-7828a5046782 0.0.0.0/0 0.0.0.0/0
0 0 DROP 0 -- * br-3d5e05d68993 !172.23.0.0/16 0.0.0.0/0
0 0 DROP 0 -- br-3d5e05d68993 * 0.0.0.0/0 !172.23.0.0/16
0 0 DROP 0 -- * br-3237097f63b8 !172.18.0.0/16 0.0.0.0/0
0 0 DROP 0 -- br-3237097f63b8 * 0.0.0.0/0 !172.18.0.0/16
0 0 DROP 0 -- * br-f90df49ea7de !172.19.0.0/16 0.0.0.0/0
0 0 DROP 0 -- br-f90df49ea7de * 0.0.0.0/0 !172.19.0.0/16
5801 2899K RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (3 references)
pkts bytes target prot opt in out source destination
0 0 DROP 0 -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 DROP 0 -- * br-e97df219409b 0.0.0.0/0 0.0.0.0/0
0 0 DROP 0 -- * br-7828a5046782 0.0.0.0/0 0.0.0.0/0
659 175K RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
pkts bytes target prot opt in out source destination
5801 2899K RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0
This persists in 27.5.1 as well
Thank you @cheesycod - I don't see a problem with those rules. They look like 27.x rules, after a flush or reboot (if it's after a downgrade)?
No packets seem to be getting dropped - is the dump from a run where it was failing?
Had it been working on 27.x, and it's now broken following a downgrade - or has it never worked properly in 27.x either?
Could you send the nat table too? (iptables -nvL -t nat).
Moby 28.0.1 is available now ... although I can't spot an issue in the iptables dump above, it might be worth a try.
If it still doesn't work - it'd be useful to see iptables -nvL and iptables -nVL -t nat, after making some failed requests (to make sure the packet counters in the iptables dumps have something to show, if iptables is the issue).
Same issue with 28.0.1 (Docker version 28.0.1, build 068a01e).
Seems randomly caused by a DinD container service (also Version 28.0.1), which gets started (and removed) from a GitLab CI/CD Pipeline by a gitlab-runner on the same machine..
:~# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy DROP 7216 packets, 463K bytes)
pkts bytes target prot opt in out source destination
114 18733 DOCKER-USER 0 -- * * 0.0.0.0/0 0.0.0.0/0
121 20073 DOCKER-FORWARD 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain DOCKER (16 references)
pkts bytes target prot opt in out source destination
0 0 DROP 0 -- !docker0 docker0 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-BRIDGE (1 references)
pkts bytes target prot opt in out source destination
0 0 DOCKER 0 -- * br-65686538b6c3 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-755b77a90cb0 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-bb3a8cd93a85 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-1b7b5c0abfb7 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-4412a5d2db9b 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-711e87b1bfcb 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-97ec4c2cbbd3 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-99916fd13930 0.0.0.0/0 0.0.0.0/0
16405 981K DOCKER 0 -- * br-b44f4a949bbc 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-d8a6fd7457a7 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-ea9158ec914f 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-0095e160b80c 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-cab8f4420925 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-fdf1de3f3140 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * br-30419ed6c2ac 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER 0 -- * docker0 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-CT (1 references)
pkts bytes target prot opt in out source destination
9589 1601K ACCEPT 0 -- * br-65686538b6c3 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-755b77a90cb0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
15257 36M ACCEPT 0 -- * br-bb3a8cd93a85 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
126 383K ACCEPT 0 -- * br-1b7b5c0abfb7 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-4412a5d2db9b 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-711e87b1bfcb 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-97ec4c2cbbd3 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-99916fd13930 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
660K 61M ACCEPT 0 -- * br-b44f4a949bbc 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-d8a6fd7457a7 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
13143 10M ACCEPT 0 -- * br-ea9158ec914f 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-0095e160b80c 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-cab8f4420925 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-fdf1de3f3140 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * br-30419ed6c2ac 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT 0 -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
Chain DOCKER-FORWARD (1 references)
pkts bytes target prot opt in out source destination
121 20073 DOCKER-CT 0 -- * * 0.0.0.0/0 0.0.0.0/0
90 13751 DOCKER-ISOLATION-STAGE-1 0 -- * * 0.0.0.0/0 0.0.0.0/0
90 13751 DOCKER-BRIDGE 0 -- * * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT 0 -- docker0 * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
pkts bytes target prot opt in out source destination
0 0 DOCKER-ISOLATION-STAGE-2 0 -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
pkts bytes target prot opt in out source destination
0 0 DROP 0 -- * docker0 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
pkts bytes target prot opt in out source destination
1045K 3755M RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
205 12316 DOCKER 0 -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER 0 -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE 0 -- * !docker0 172.17.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-30419ed6c2ac 192.168.112.0/20 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-fdf1de3f3140 172.22.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-cab8f4420925 172.25.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-0095e160b80c 192.168.160.0/20 0.0.0.0/0
1062 63720 MASQUERADE 0 -- * !br-ea9158ec914f 172.24.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-d8a6fd7457a7 192.168.128.0/20 0.0.0.0/0
229 13740 MASQUERADE 0 -- * !br-b44f4a949bbc 172.18.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-99916fd13930 192.168.176.0/20 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-97ec4c2cbbd3 192.168.144.0/20 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-711e87b1bfcb 172.29.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-4412a5d2db9b 172.23.0.0/16 0.0.0.0/0
2 120 MASQUERADE 0 -- * !br-1b7b5c0abfb7 172.19.0.0/16 0.0.0.0/0
379 22764 MASQUERADE 0 -- * !br-bb3a8cd93a85 172.26.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * !br-755b77a90cb0 192.168.96.0/20 0.0.0.0/0
342 20520 MASQUERADE 0 -- * !br-65686538b6c3 172.20.0.0/16 0.0.0.0/0
0 0 MASQUERADE 0 -- * * 172.21.0.0/16 0.0.0.0/0
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN 0 -- docker0 * 0.0.0.0/0 0.0.0.0/0
Seems to be related to a DinD container service (also Version 28.0.1), which gets started (and removed) from a GitLab CI/CD Pipeline by a gitlab-runner on the same machine..
Hi @TheEvilCoder42 ... so, you have two instances of dockerd running on the same host, or have I misunderstood?
(They'll certainly interfere with each other, I don't think it would have worked reliably with any release?)
No there is only one instance of dockerd running.
The machine also runs the gitlab-runner service which pulls CI/CD Pipeline jobs from GitLab and runs them in containers. A DinD service in the Pipeline (another container with the docker:dind image) seems to sporadically trigger the loss of network connection for all containers.
Got it - thank you. Is the DinD container running with --network host?
In the iptables dump, there are rules for a lot of bridge networks in DOCKER-BRIDGE and DOCKER-CT, but not in the DOCKER-ISOLATION chains. On startup the daemon flushes most of the chains - but not those two new ones (which is a bug, https://github.com/moby/moby/pull/49582).
So, it looks like that might have happened to the outer docker's rules.
Yes exactly the gitlab-runner is configured to use --network host.
Forget to mention, there are no additional iptables rules, nor persistance and a docker service restart fixes the issue (at least until a build sporadically triggers the issue again)
Also I'm not entirely sure if this could be related to this issue: https://github.com/docker-library/docker/issues/463
The machine runs Debian 12 Bookworm (Linux REDACTED 6.1.0-31-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux)
Yes exactly the gitlab-runner is configured to use
--network host.Forget to mention, there are no additional iptables rules, nor persistance and a docker service restart fixes the issue (at least until a build sporadically triggers the issue again)
Have been travelling so couldn’t test with 28.0.1 (though I did manage to reproduce it on even 27), in my case the issue occurs purely with bridge networks and no network host
Yes exactly the gitlab-runner is configured to use
--network host.
Ok, I don't think that can work. From the networking perspective, it's the same as running two docker daemons on the host, they will interfere with each other.
Restarting the daemon on the host fixes it by re-creating all the rules needed by Docker on the host machine.
I'm wondering if it ended up a bit less broken before 28.0 because some rules we've now moved out of the FORWARD chain wouldn't have been flushed when the DinD daemon started. But, even then, I think network isolation, port mappings, and various things would have stopped working on the host.
Does the runner need to use the host's network?
Also I'm not entirely sure if this could be related to this issue: docker-library/docker#463 The machine runs Debian 12 Bookworm (
Linux REDACTED 6.1.0-31-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux)
I think that's unrelated, to do with mixing nftables and legacy xtables.
Have been travelling so couldn’t test with 28.0.1 (though I did manage to reproduce it on even 27), in my case the issue occurs purely with bridge networks and no network host
Thanks @cheesycod - it sounds like these are separate issues. When you have time, it'd be good to collect iptables dumps for the working and not-working states to see what's changed - assuming it's iptables related.
Ok, I don't think that can work. From the networking perspective, it's the same as running two docker daemons on the host, they will interfere with each other.
Restarting the daemon on the host fixes it by re-creating all the rules needed by Docker on the host machine.
I'm wondering if it ended up a bit less broken before 28.0 because some rules we've now moved out of the
FORWARDchain wouldn't have been flushed when the DinD daemon started. But, even then, I think network isolation, port mappings, and various things would have stopped working on the host.Does the runner need to use the host's network?
Well that sounds reasonable, it's now configured to use --network bridge, since this runner doesn't strictly need any VPN or DNS foolery and also runs other containers besides being a runner.
Seems it was a bit less broken before 28.0, since it worked without any issue so far (despite some warnings comming from the DinD service).
I think that's unrelated, to do with mixing nftables and legacy xtables.
Alright, thanks for your input!
Thank you very much, let's hope this fixed my problem :).
@cheesycod Sorry to have hijacked your issue :).
Thank you very much, let's hope this fixed my problem :).
Thanks @TheEvilCoder42 ... if it's not fixed, please do raise a new issue.
Have been travelling so couldn’t test with 28.0.1 (though I did manage to reproduce it on even 27), in my case the issue occurs purely with bridge networks and no network host
Thanks @cheesycod - it sounds like these are separate issues. When you have time, it'd be good to collect iptables dumps for the working and not-working states to see what's changed - assuming it's iptables related.
Alright, update on this issue, seems like things magically decided to start becoming more reliable all of a sudden with no change on my part. Will update again once more once I manage to reproduce this bug again (it'll probably resurface tomorrow after a few more service restarts for updates, I love random race conditions and conflicting stuff)
Thanks @cheesycod.