docker-volume-backup
docker-volume-backup copied to clipboard
Containers that provides network service to other containers should be backed up first
Describe the bug Currently, it seems like the sequence of stopping and starting containers during backup is somewhat arbitrary. This causes an issue if one or more of the containers are using another container as the network provider and the dependent container is started before the network provider container. In this case the dependent containers fail to start.
To Reproduce Steps to reproduce the behavior:
- Create 3 containers - A, B & C in a docker-compose file
- Set
network
toservice:A
for container B & C - Run the backup
Expected behavior Container A is stopped, backed up and restarted. Then containers B & C should be stopped and started again.
Desktop (please complete the following information):
- Image Version: v2.21.0
- Docker Version: 20.10.17
- Docker Compose Version (if applicable): 2.4
Additional Context The Steps to reproduce above may or may not work depending on whether container A is down when either B or C is being restarted. I'm not quite sure how to have it fail reliably.
I didn't even know you can use a container as a "network provider" and I also cannot find any info about this in here: https://docs.docker.com/compose/networking/
How does this compare to using an explicitly declared network that you put your containers in? Like this, stopping and restarting should work as you expect it. Would this be a viable workaround for your situation?
As for the root cause of what you describe it seems as if Docker will give no guarantees about the ordering of the containers returned here: https://github.com/offen/docker-volume-backup/blob/00c83dfac79af6f03c677e187b5bce6817b2c2a7/cmd/backup/script.go#L268-L274
which causes the behavior you describe. I have a hard time imagining some mechanism that sorts these correctly based on their dependencies (this is pretty complex and might involve more than just networks) without blowing up complexity a whole lot. If this was to be supported I guess the way to go is having users label their services with some sort of priority value that is then used to sort the container before starting and stopping.
Unfortunately for my use case a separate network won't work.
You can see the reference for this network mode here and here. One example of using this when you have a container that connects to a VPN and other containers connected to it have all their traffic routed through that VPN connection. Example Wireguard image that can be used for something like this.
You're right though. I had a quick look at the docs and there doesn't seem to be an easy way to get the dependencies with the docker cli, mostly because depends_on
etc. is compose syntax. docker-compose
does do it internally (reference).
The best way I can think of is similar to what you suggested. Either have a label indicating the priority or a label/environment variable that's something like offen.depends_on=container_name
, which could then be used to prioritize the backup sequence internally. This will also avoid having to have a priority label on every container.
PS. There are other reasons to sequence the backups besides the container
mode networking. For example, if you have a database container and application containers that use it, you would want to make sure the database container is started up before the application containers come online. The offen.depends_on
label would be useful in this scenario as well.
Implementing a depends_on
sounds nice, but I think it also brings a lot of semantic ambiguities, so if this was to be implemented I would prefer a plain numeric value (alphanumeric maybe?) in docker-volume-backup.start_priority
and docker-volume-backup.stop_priority
which lets people declare the same behavior in a pretty predictable way.
Like this, all that would need to be added is a sorting of the containers returned by ContainerList
based on the values in these labels before starting and stopping. If none are present, sorting will not be changed and the current behavior is kept.
If anyone wants to work on this I am happy to provide feedback and merge PRs.
I recently ran into this issue as I have a container that depends on the network of another container for a vpn connection. The entire backup process was failing every time for two weeks because of this... glad I caught it in the logs. :O
Same for me... Tailscale(VPN) all with "network_mode: YX" have this problem All: "Error response from daemon: cannot join network of a non running container"
does anyone know a temporary solution? like autoheal for stopped container? dunno
If someone could dig up the part of the source code for compose that defines the start/stop order for services/containers, that would be interesting now that compose is also written in Go. Maybe it's possible to do what is already done with some parts of the docker CLI and directly reuse code from there in this tool.