Backup script causing VLAN issue
I've been trying to get to a bottom of a problem that has plagued me for some time. I have my IOT devices on a separate VLAN with a firewall rule to let me access them from the main LAN. Every so often (once a month or so?) I can't access the Raspberry PI from the LAN, but if I put myself on the IOT VLAN then I can. I can access other devices from the LAN that are on the VLAN so it doesn't look like a firewall issue. It tends to "fix itself" the next day. As my IOTstack install was quite old and I'd learned quite a bit since the first install, I recently did a clean install on the latest PI OS thinking that would be the end of my network issues. Today however the access to VLAN was again lost. After the usually log checks, reboots of the PI, firewall and network switch, I finally had a thought that the backup script runs at 2am every day. I therefore ran it manually and my network access came back!!!!
Any idea what in the script is causing this issue - especially when a reboot of the PI doesn't fix it, but rerunning the script does?
I've checked the backup log from 2am this morning and the one that I ran this afternoon and there isn't anything that jumps out in the way of errors.
I won't be able to confirm this is definitely the issue until it happens again at some point, but it is the first time I've done something which appears to have immediately fixed the problem so the backup script appears to be the key.
Hi,
You haven't really given me enough to go on. Specifically, it isn't clear how your Raspberry Pi is connected. Is it:
- The Pi plus the "other" IOT devices only attach to the VLAN;
- The IOT devices only attach to the VLAN but the Pi attaches to both the main LAN and the VLAN; or
- Something else?
It also isn't clear which device is implementing your firewall rules. Is that the Pi or something else?
A diagram of how you have things set up would really help.
In the meantime, I'm going to make some (hopefully) educated guesses about what might be going on; at least to the extent of pointing to what might explain the observed behaviour of the problem resolving itself when you run the backup.
I'm assuming you mean the backup script which is supplied with IOTstack, and not my IOTstackBackup scripts.
One of the significant differences between the supplied backup script and IOTstackBackup is the supplied script takes your IOTstack down while the backup is running, whereas my IOTstackBackup scripts don't need to do that (one of the reasons why I wrote IOTstackBackup in the first place).
With that in mind, assume my stack is running:
$ DPS
NAMES CREATED STATUS
zigbee2mqtt 44 hours ago Up 44 hours
wireguard 2 days ago Up 2 days
pihole 2 days ago Up 2 days (healthy)
portainer-ce 2 days ago Up 2 days
mosquitto 3 days ago Up 26 hours (healthy)
nodered 3 days ago Up 3 days (healthy)
grafana 3 days ago Up 3 days (healthy)
influxdb 3 days ago Up 3 days (healthy)
How many lines of net filter rules are in place while the stack is running?
$ sudo nft list ruleset | wc -l
92
Let's simulate the effect of a backup with the "supplied script" by bouncing the stack and checking the filter tables as we go along:
$ DOWN
[+] Running 9/9
⠿ Container influxdb Removed 1.0s
⠿ Container zigbee2mqtt Removed 1.2s
⠿ Container portainer-ce Removed 0.9s
⠿ Container grafana Removed 1.1s
⠿ Container nodered Removed 1.0s
⠿ Container wireguard Removed 10.6s
⠿ Container mosquitto Removed 0.4s
⠿ Container pihole Removed 4.6s
⠿ Network iotstack_default Removed 0.2s
$ sudo nft list ruleset | wc -l
48
$ UP
[+] Running 9/9
⠿ Network iotstack_default Created 0.1s
⠿ Container pihole Started 2.9s
⠿ Container portainer-ce Started 3.3s
⠿ Container mosquitto Started 2.7s
⠿ Container nodered Started 2.3s
⠿ Container grafana Started 2.6s
⠿ Container influxdb Started 2.5s
⠿ Container wireguard Started 5.5s
⠿ Container zigbee2mqtt Started 5.7s
$ sudo nft list ruleset | wc -l
92
So, without getting into which filter rules are added/removed around stack up/down, let's just suppose a net filter rule in the Pi is going wonky for some reason that we don't yet understand, and assume that explains occasional non-reachability. I can imagine the stack down removing the wonky rule, and the subsequent up restoring the rule to a working state.
My knowledge of what actually happens on a reboot while the stack is up is limited but some other behaviours have made me think Docker snap-freezes the state as the machine goes down and thaws the frozen state when the machine comes back. In other words, closer to a "pause" and "unpause" than a "down" and "up". If that's true, I can kinda imagine net filter tables being saved and restored "as is" rather than being withdrawn and recreated.
The stack going down/up also changes the Pi's routing table so I'd also be running netstat -rn to see if anything jumped out. I assume you understand that running Docker on a Pi turns on IPv4 forwarding, so the routing table getting hosed somehow could mean that packets leaving your workstation do actually reach the Pi but are misdirected on return. I've never seen Docker do more than add/remove "br-xxx" interfaces pointing to the internal bridged networks so, on balance, I'd say the chance of that explaining the observed behaviour is on the low side. Still, it could be worth a look.
The other thing I'd be doing is running tcpdump to capture traffic at various points while the problem was present.
In thinking about your recent rebuild, did you also rebuild your IOTstack with up-to-date service definitions or did you just do a restore and pick up with your existing docker-compose.yml? To put this question another way, what do you see when you run:
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
acbeb27a72c1 bridge bridge local
607dff8f0b07 host host local
efd759dca552 iotstack_default bridge local
ccfbe42748fe none null local
The only other possibility in that list should be iotstack_nextcloud. If you're seeing things like iotstack_nw then you might want to work through network migration.
If you're interested, I actually have my docker-compose.yml set up like this:
version: '3.6'
networks:
default:
driver: bridge
ipam:
driver: default
config:
- subnet: 172.30.0.0/22
nextcloud:
driver: bridge
internal: true
ipam:
driver: default
config:
- subnet: 172.30.4.0/22
services:
…
That is, I put the networks section near the top (which makes it easier to concatenate new service definitions onto the end) and I have predictable subnets.
Predictable subnets mean I never run into Docker doing a random allocation that collides with something else I'm doing. My rule is 10/8 for ZeroTier, 172.16/12 for Docker, 192.168/16 for me. Having a known subnet for iotstack_default is also somewhere between useful and essential if you want to run BIND9 in a container (which I do).
Truth to tell, I don't run NextCloud so I just comment-out the lines for the second subnet.
If your compose file is really ancient, I'd also advise a bit of compare/contrast between your active service definitions and those in the .templates folder. We keep improving things (eg Mosquitto and Node-RED now have better mechanisms for passing build-time arguments) but the menu lacks the smarts to detect that and at least say "hey - have you considered..."
Anyway, hope this helps.
OK. It happened again today and I did some tests before and after taking the stack down and bringing it back up again. Doing this did result in the network working again. I didn't do the tcpdump but the netstat -rn showed a difference between the working and non-working system:
Not working:
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.30.1 0.0.0.0 UG 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth9551033 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth31436bd 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth949ff3b 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 vethd91d085 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth9ee4d2d 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth8977e93 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth209ad03 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth8e14561 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth6f6c5f0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.0.0 0.0.0.0 255.255.240.0 U 0 0 0 br-cfc431c5fcd8 192.168.30.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
Working: Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.30.1 0.0.0.0 UG 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 vetha16d81c 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth3328f10 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth3ea28dc 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 vethcfc164a 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth29668e6 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth83922ce 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth66ec22e 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth7f6e098 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 veth7dc8fe5 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.30.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.168.32.0 0.0.0.0 255.255.240.0 U 0 0 0 br-2f02f89b13e8
I'm out of my depth on what is going on here but the bridge at the end appears to be different. My LAN is on 192.168.1.x, the VLAN is 192.168.30.x. I have no idea what 192.168.32.0 is but it is working when it is present.
Any thoughts would be much appreciated.
I think I know what might be going on but I won't know for certain until I can get some more information.
Please provide:
-
Your
docker-compose.yml, suitably edited to obfuscate anything sensitive like passwords. If you use any override files, include those too, please. -
The output from running the following commands:
$ uname -a $ grep "VERSION=" /etc/os-release $ grep -e "allowinterfaces" -e "denyinterfaces" /etc/dhcpcd.conf $ docker network ls $ docker ps --format "table {{.Names}}\t{{.RunningFor}}\t{{.Status}}"
Please wrap your docker-compose.yml and the console output from running the commands inside triple-backtick lines (otherwise known as "code fences"):
``` output here ```
Code-fences use monospaced font and respect end-of-lines so the original layout is preserved and everything is much easier to read.
Thanks for your support with this @Paraphraser . Requested information is below:
docker-compose.yml:
version: '3.6'
services:
home_assistant:
container_name: home_assistant
image: ghcr.io/home-assistant/home-assistant:stable
#image: ghcr.io/home-assistant/raspberrypi3-homeassistant:stable
#image: ghcr.io/home-assistant/raspberrypi4-homeassistant:stable
restart: unless-stopped
network_mode: host
volumes:
- /etc/localtime:/etc/localtime:ro
- ./volumes/home_assistant:/config
- /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket
devices:
- "/dev/ttyAMA0:/dev/ttyAMA0"
- "/dev/vcio:/dev/vcio"
- "/dev/gpiomem:/dev/gpiomem"
privileged: true
mosquitto:
container_name: mosquitto
build:
context: ./.templates/mosquitto/.
args:
- MOSQUITTO_BASE=eclipse-mosquitto:latest
restart: unless-stopped
environment:
- TZ=Etc/UTC
ports:
- "1883:1883"
volumes:
- ./volumes/mosquitto/config:/mosquitto/config
- ./volumes/mosquitto/data:/mosquitto/data
- ./volumes/mosquitto/log:/mosquitto/log
- ./volumes/mosquitto/pwfile:/mosquitto/pwfile
nodered:
container_name: nodered
build: ./services/nodered/.
restart: unless-stopped
user: "0"
environment:
- TZ=Etc/UTC
ports:
- "1880:1880"
volumes:
- ./volumes/nodered/data:/data
- ./volumes/nodered/ssh:/root/.ssh
- /var/run/docker.sock:/var/run/docker.sock
- /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket
devices:
- "/dev/ttyAMA0:/dev/ttyAMA0"
- "/dev/vcio:/dev/vcio"
- "/dev/gpiomem:/dev/gpiomem"
portainer-ce:
container_name: portainer-ce
image: portainer/portainer-ce
restart: unless-stopped
ports:
- "8000:8000"
- "9000:9000"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./volumes/portainer-ce/data:/data
zigbee2mqtt:
container_name: zigbee2mqtt
image: koenkk/zigbee2mqtt:latest
environment:
- TZ=Etc/UTC
- ZIGBEE2MQTT_CONFIG_MQTT_SERVER=mqtt://mosquitto:1883
- ZIGBEE2MQTT_CONFIG_FRONTEND=true
- ZIGBEE2MQTT_CONFIG_ADVANCED_LOG_SYMLINK_CURRENT=true
ports:
- "8081:8080"
volumes:
- ./volumes/zigbee2mqtt/data:/app/data
devices:
- /dev/ttyAMA0:/dev/ttyACM0
restart: unless-stopped
depends_on:
- mosquitto
zigbee2mqtt_assistant:
container_name: zigbee2mqtt_assistant
image: carldebilly/zigbee2mqttassistant
restart: unless-stopped
ports:
- "8881:80"
environment:
- VIRTUAL_HOST=~^zigbee2mqtt_assistant\..*\.xip\.io
- Z2MA_SETTINGS__MQTTSERVER=mosquitto
- VIRTUAL_PORT=8880
networks:
default:
driver: bridge
ipam:
driver: default
nextcloud:
driver: bridge
internal: true
ipam:
driver: default
docker-compose.override.yml:
version: '3.6'
services:
zigbee2mqtt:
ports:
- "8081:8080"
devices:
# - /dev/ttyAMA0:/dev/ttyACM0 # should work even if no adapter
# - /dev/ttyACM0:/dev/ttyACM0 # should work if CC2531 connected
#- /dev/ttyUSB0:/dev/ttyACM0 # Electrolama zig-a-zig-ah! (zzh!) maybe other as well
# - /dev/ttyUSB0:/dev/ttyUSB0
- /dev/serial/by-id/usb-Silicon_Labs_slae.sh_cc2652rb_stick_-_slaesh_s_iot_stuff_00_12_4B_00_23_90_DA_A8-if00-port0:/dev/ttyUSB1
zigbee2mqtt_assistant:
ports:
- "8881:80"
environment:
- VIRTUAL_HOST=~^zigbee2mqtt_assistant\..*\.xip\.io
- Z2MA_SETTINGS__MQTTSERVER=mosquitto
- VIRTUAL_PORT=8881
unifi-controller:
# image: ghcr.io/linuxserver/unifi-controller:7.1.65
image: ghcr.io/linuxserver/unifi-controller:latest
container_name: unifi-controller
environment:
- PUID=1000
- PGID=1000
- MEM_LIMIT=1024M #optional
volumes:
- ./volumes/unifi/config:/config
ports:
- 3478:3478/udp
- 10001:10001/udp
- 8080:8080
- 8443:8443
# - 1900:1900/udp #optional
- 8843:8843 #optional
- 8880:8880 #optional
- 6789:6789 #optional
- 5514:5514 #optional
restart: unless-stopped
eufy-security-ws:
image: 'bropat/eufy-security-ws:latest'
container_name: eufy-security
environment:
- USERNAME=xxxxxxxxxxxxxx
- PASSWORD=xxxxxxxxxxxxxx
- COUNTRY=GB
volumes:
- ./volumes/eufy-security/data:/data
ports:
- '3000:3000'
restart: unless-stopped
privileged: true
rtsp-simple-server:
image: 'aler9/rtsp-simple-server:latest'
container_name: rtsp-simple-server
environment:
- RTSP_PROTOCOLS=tcp
ports:
- '8554:8554'
- '1935:1935'
restart: unless-stopped
duplicati:
image: duplicati/duplicati:latest
container_name: duplicati
environment:
- PUID=1000
- PGID=1000
- TZ=Europe/London
#- CLI_ARGS= #optional
volumes:
- ./volumes/duplicati/config:/config
#- </path/to/backups>:/backups
- ./:/source
ports:
- "8200:8200"
restart: unless-stopped
# portainer-ce:
# image: portainer/portainer-ce:2.11.1
uname -a
Linux homeassistant 5.15.61-v8+ #1579 SMP PREEMPT Fri Aug 26 11:16:44 BST 2022 aarch64 GNU/Linux
grep "VERSION=" /etc/os-release
VERSION="11 (bullseye)"
Nothing from
grep -e "allowinterfaces" -e "denyinterfaces" /etc/dhcpcd.conf
docker network ls
NETWORK ID NAME DRIVER SCOPE
e597beb4866d bridge bridge local
e5ca99c1dfaa host host local
2f02f89b13e8 iotstack_default bridge local
ad433d1b4d2a none null local
docker ps --format "table {{.Names}}\t{{.RunningFor}}\t{{.Status}}"
NAMES CREATED STATUS
zigbee2mqtt 5 hours ago Up 5 hours
eufy-security 5 hours ago Up 5 hours
mosquitto 5 hours ago Up 5 hours (healthy)
rtsp-simple-server 5 hours ago Up 5 hours
portainer-ce 5 hours ago Up 5 hours
home_assistant 5 hours ago Up 5 hours
zigbee2mqtt_assistant 5 hours ago Up 5 hours
unifi-controller 5 hours ago Up 5 hours
nodered 5 hours ago Up 5 hours (healthy)
duplicati 5 hours ago Up 5 hours
OK. I think I can explain the problem and tell you how to fix it.
tl;dr
Add the following line to /etc/resolv.conf:
allowinterfaces eth*,wlan*
Then take down your stack and reboot your machine. Taking the stack down ensures that docker cleans up the networks properly so please don't skip that bit.
The details
What follows is me responding to your "I'm out of my depth on what is going on here". I'll try to flesh out the picture with my understanding of how it all hangs together.
I might be wrong about some of the details, of course. But I always hope that, if I say something wrong or dumb, people with more knowledge might read it and elect to share.
Here's the equivalent (relevant) output from my system:
$ head -22 IOTstack/docker-compose.yml
version: '3.6'
networks:
default:
driver: bridge
ipam:
driver: default
config:
- subnet: 172.30.0.0/22
# nextcloud:
# driver: bridge
# internal: true
# ipam:
# driver: default
# config:
# - subnet: 172.30.4.0/22
services:
portainer-ce:
…
$ grep -e "allowinterfaces" -e "denyinterfaces" /etc/dhcpcd.conf
allowinterfaces eth*,wlan*
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
530d50e0ece0 bridge bridge local
a99c0c6d43bd host host local
1b18f152b579 iotstack_default bridge local
848b257e3b00 none null local
$ docker ps --format "table {{.Names}}\t{{.RunningFor}}\t{{.Status}}"
NAMES CREATED STATUS
wireguard 41 hours ago Up 41 hours
influxdb 3 days ago Up 3 days (healthy)
pihole 2 weeks ago Up 2 weeks (healthy)
portainer-ce 2 weeks ago Up 2 weeks
nodered 3 weeks ago Up 9 days (healthy)
mosquitto 5 weeks ago Up 4 weeks (healthy)
grafana 5 weeks ago Up 4 weeks (healthy)
Here's my routing table:
$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.132.1 0.0.0.0 UG 0 0 0 eth0
0.0.0.0 192.168.132.1 0.0.0.0 UG 0 0 0 wlan0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.30.0.0 0.0.0.0 255.255.252.0 U 0 0 0 br-1b18f152b579
192.168.132.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.132.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0
Stitching this together into what I hope will be a (somewhat) coherent story:
-
The last two lines of my routing table:
92.168.132.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 92.168.132.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0My Raspberry Pi has both
eth0andwlan0interfaces active. They are both in the same subnet (192.168.132.0/24) so that implies the two media types (Ethernet and WiFi are bridged). In that sense, both interfaces offer alternate paths to the same subnet. -
The first two lines of my routing table:
.0.0.0 192.168.132.1 0.0.0.0 UG 0 0 0 eth0 .0.0.0 192.168.132.1 0.0.0.0 UG 0 0 0 wlan0The default (0.0.0.0/0) gateway (the "G" flag) for each interface is 192.168.132.1. In the case of this Pi, DHCP provided that information when the Pi booted and asked for its IP addresses.
-
When it comes to local physical interfaces, the only real difference between your routing table and mine is your
wlan0is not enabled. -
The third line of my routing table:
72.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0This line is also present in yours. The
docker0interface is a by-product of having installed Docker. Many Google hits suggest the subnet is assigned randomly but I have never seen it be anything other than 172.17/16.I believe it can be controlled via
/etc/docker/daemon.json.Googling "what is docker0" says it is the bridge between "containers" and the host (Pi) internal network. I think this network is used when you instantiate a container with
docker runrather than withdocker-compose. -
That leaves us with the fourth line:
72.30.0.0 0.0.0.0 255.255.252.0 U 0 0 0 br-1b18f152b579This is the equivalent of your before-and-after:
92.168.0.0 0.0.0.0 255.255.240.0 U 0 0 0 br-cfc431c5fcd8 92.168.32.0 0.0.0.0 255.255.240.0 U 0 0 0 br-2f02f89b13e8Points:
-
If you go back to my
docker network lsyou'll see:NETWORK ID NAME DRIVER SCOPE 1b18f152b579 iotstack_default bridge localNote how the suffix in
br-1b18f152b579comes from the Network ID. Each time theiotstack_defaultnetwork is created by docker-compose, a new random ID is chosen and that becomes the bridge name in your routing table.In
iotstack_default, theiotstackis the all-lower-case representation of the folder containingdocker-compose.yml. -
The subnet mask in mine is 255.255.252.0 which is a /22 prefix while yours is 255.255.240.0 which is a /20 prefix. Expanding:
- 172.30.0.0/22 = 172.30.0.0 … 172.30.3.255 (max 1021 containers)
- 192.168.0.0/20 = 192.168.0.0 … 192.168.15.255 (max 4093 containers)
- 192.168.32.0/20 = 192.168.32.0 … 192.168.47.255 (max 4093 containers)
While neither of those 192.168 ranges overlaps your
eth0subnet of 192.168.30/24, it's the kind of Maxwell Smart "missed it by this much" proximity that makes me twitchy, especially given your original post where you talked about VLAN issues. Basically, Docker can only avoid collisions where it can "see" the situation. Docker only knows what the Pi knows. If the Pi doesn't have full knowledge of your home network topology then you can easily wind up with something between suboptimal routing and malfunction. That's why I advise instucting docker-compose to use a specific subnet foriotstack_default, by augmenting the networks definition in your compose file:networks: default: driver: bridge ipam: driver: default config: - subnet: 172.30.0.0/22It doesn't matter if you have a dozen Pis all using the same 172.30.0.0/22 subnet for
iotstack_default(any more than it matters if a dozen Pis use 172.17.0.0/16 fordocker0). There's no confusion because it's all behind Network Address Translation (NAT). The only thing you need to avoid is using any part of the 172.17.0.0/16 or 172.30.0.0/22 networks outside of docker-space. The easiest avoidance mechanism is doing what most people do already: use subnets carved from 192.168/16.There's nothing magical about my choice of /22. I could just barely imagine a "future Pi" having enough grunt to run more than 253 containers (a /24 prefix). Docker's default of a /20 and 4093 containers seemed ridiculous. I took the middle position of a /22. Still ridiculous, just less so.
-
Rolling those together, if you specify the subnet in your compose file, the
br-suffix will change but the subnet will always be predictable. That facilitates other things such as if you tell a container to use 172.30.0.1 (the default gateway) for DNS, the queries will be forwarded to any non-host-mode container offering resolution (eg PiHole).
-
Which brings me back to the original problem.
You'll note that, even though my system is running 7 containers, my routing table doesn't have any veth prefix interfaces.
You're running 10 containers and have 9 veth interfaces.
The reason why you don't have 10
vethinterfaces is because home-assistant is running in host mode. It attaches directly to your Pi's physical interfaces. The other 9 containers are running in non-host mode so they have an internaleth0(that only the container can see) which is bound to an externalvethinterface (that the Pi can see).
So, why does your routing table have veth interfaces when mine has none?
It is not because all my containers run in host mode. All 7 of mine are running in non-host mode.
It is because I have the following line in my /etc/dhcpcd.conf:
allowinterfaces eth*,wlan*
You don't have that line.
What that line does is tells the DHCP client daemon to do the equivalent of:
- Don't provide DHCP services to any internal interface (like an implicit firewall "drop everything else" rule); then
-
Do provide DHCP services to any internal interfaces matching the
eth*orwlan*patterns.
This neatly avoids both docker0 and all the veth interfaces participating in DHCP.
What seems to be going on is that docker (for docker0) and docker-compose (for veth interfaces) is managing the assignment of IP addresses on those subnets but the Pi's dhcpcd daemon is trying to horn-in on the act. Our best guess is that it leads to a deadlock situation somewhere and then the Pi gives every impression of being frozen and unreachable.
I didn't come up with this solution but if you read through Issue 219 you'll see that I faced a similar problem. Basically, if I did a reboot without first taking the stack down, the Pi would appear to freeze on the way up. The solution was provided by GB Smith in Issue 253. It subsequently became one of IOTstack's recommended patches.
postscript
The content of /etc/dhcpcd.conf is an ongoing problem. Examples:
-
I was trying to construct a transparent bridge between two USB-to-Ethernet "dongles". Those presented as
eth1andeth2. Assigning IP addresses was unwanted so I needed to edit:allowinterfaces eth0,wlan*In other words, do provide DHCP for
eth0but, by inference, not for any othereth*. -
I'm fiddling about building a Raspberry Pi router to replace my off-the-shelf "commercial" router. It's based on a Seeed "mini router".
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1.2 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 172.30.0.0 0.0.0.0 255.255.252.0 U 0 0 0 br-a3b64c14a10d 192.168.134.0 0.0.0.0 255.255.255.128 U 0 0 0 eth0.1 192.168.134.128 0.0.0.0 255.255.255.128 U 0 0 0 eth0.2The primary physical port
eth0trunks two VLANs,eth0.1(for "house") andeth0.2(for "DMZ") to a managed switch. Routers really need static IP addresses so DHCP needs to stay out of the way.The secondary physical port
eth1will connect to the ISP. It needs to be tagged as VLAN ID 2 because that's an ISP requirement. When PPP comes up asppp0it will obtain its IP address from the ISP. Again, DHCP needs to stay away from all of that.At the moment, the Ethernet cable to the
eth1port is connected to a hub so the interface will come up. A DHCP server is reachable via that hub but neithereth1noreth1.2is asking for DHCP and neither has a static assignment. Getting a link-local IP in 169.254/16 proves DHCP has been told to keep mum.Overall:
allowinterfaces wlan*So, if I enable WiFi then it will get a dynamic assignment whereas everything else is left alone.
Will any of this change when Network Manager becomes the go-to solution for Raspberry Pi's (instead of dhcpcd)? No idea.
Hope this helps.
WOW! What a reply. Thank you so much @Paraphraser. I will complete your suggestions at the first opportunity. It will however likely take several re-reads to fully understand all of the background information you have provided but I will definitely learn from it and hopefully others will too.