docker-volume-backup icon indicating copy to clipboard operation
docker-volume-backup copied to clipboard

"swarm exec" support

Open zibellon opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? Please describe. Yes. The problem was described in issue 423. previous issue

Main points of problem

  1. Swarm cluster. 2 Nodes. MASTER_1, WORKER_1
  2. Run postgres with docker on WORKER_1
  3. Need to backup postgres. Before backup need to run script: pg_dump ...
  4. Up offen-backup on WORKER_1, add labels to deploy section of postgres stack (service)
  5. Catch the error: no master node. Can exec labels only on master

Describe the solution you'd like Solution is

  1. Run offen-backup service ONLY on master node
  2. For exec labels - run one-time container (docker:25.0.5-cli-alpine) on EACH node when we found tasks (containers) with lables
  3. For volumes - run one-time container (offen-backup, for example) on EACH node when we found tasks (containers) with labels: volume-list (You can see it in my .sh script)
  4. If we want to use different timers for different exec-lables or different volume-list-labels - we can also use your idea of exec-label=database, but add support for exec and for volume-list

Additional context I'm a big fan of docker-swarm. And use swarm cluster in production.

I spent some time and write .sh script for backup WHOLE swarm cluster, depends on labels and use your open-source project swarm-backuper.sh

  1. GET all volumes from all nodes. Start docker dind container on each node and get volume list
  2. GET all services with label volume-list. in this label - we enter volumes, which we want to backup for this service. JOIN this volumes with volumes from step one. Result: MAP<node-name, [volume1, volume2, etc]>
  3. RUN all labels: exec-pre. Run separate container on each node, where we need to run exec-pre and run docker exec ...
  4. STOP all services with label: stop. Important: stop === scale to 0, Stop only services in replicated mode
  5. RUN offen-backup on each node with all volumes from step 2
  6. RESTORE all services with label: stop. Important: we need to use the JSON from step 4. Scale to original replicas number, before step 4
  7. RUN all labels: exec-post

How we can run it ? swarm-backuper-stack.yaml


All of this functions can be write on GoLang and add to ypur project (offen-backup)

zibellon avatar Jun 01 '24 14:06 zibellon

I'm still not entirely sure why you're unable to execute commands on the same node (this should work no matter if it's a master or a worker node, albeit setup can be a bit tricky, see https://github.com/moby/moby/issues/27552).

That being said, the described approach of using one-off containers that mount volumes themselves has also been discussed here already https://github.com/offen/docker-volume-backup/issues/329 maybe this use case can be considered if this is going to be implemented.

m90 avatar Jun 02 '24 08:06 m90

Thank you @zibellon for your script ! I wasted almost a day trying to figure out why my database dump commands were not working!

BTW, did you continue updating your script since your last Github commit? I'm gonna try to tweak it a bit in the future. I'm especially looking to remove from the script everything that is configurations and try to bring that instead in the yaml file. For example, I use source volumes that are not of type "local" and I want to tweak the "backupCommand" variable.

EDIT: Not the most versatile way to implement the changes I had in mind but since it was an easy task it will be good enough as as a first iteration: https://gist.github.com/zibellon/99562543e730f5c4aedeb6c261ac01ba?permalink_comment_id=5538744#gistcomment-5538744

jpbaril avatar Apr 13 '25 17:04 jpbaril

Running into the same issue when I try to create a backup within a worker node:

$ sudo docker exec -it f36c38a453f0 /bin/sh
~ # backup
time=2025-04-24T10:21:55.808Z level=ERROR msg="
Fatal error running command: This node is not a swarm manager.
Worker nodes can't be used to view or modify cluster state.
Please run this command on a manager node or promote the current node to a manager.
" error="main.(*command).runAsCommand: error running script: main.runScript.func4: error running script: main.(*script).stopContainersAndServices: error querying for services: Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager."

Affected code:

https://github.com/offen/docker-volume-backup/blob/f9eabbc32627c2143ffc7ff562d1ca476f395d58/cmd/backup/stop_restart.go#L144-L148

Reference from Docker Swarm docs: https://docs.docker.com/reference/cli/docker/service/ls/

This is a cluster management command, and must be executed on a swarm manager node. To learn about managers and workers, refer to the Swarm mode section in the documentation.

matzeeable avatar Apr 24 '25 10:04 matzeeable

I just ran across this project by pure chance https://github.com/mavenugo/swarm-exec

Leaving this here mostly for the record and future reference.

m90 avatar Jun 05 '25 12:06 m90

I've opened a discussion to solicit further feedback on improvements in this area, feel free to chime in if you have thoughts: https://github.com/offen/docker-volume-backup/discussions/595

m90 avatar Jun 07 '25 13:06 m90

Running into the same issue when I try to create a backup within a worker node:

$ sudo docker exec -it f36c38a453f0 /bin/sh ~ # backup time=2025-04-24T10:21:55.808Z level=ERROR msg=" Fatal error running command: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager. " error="main.(*command).runAsCommand: error running script: main.runScript.func4: error running script: main.(*script).stopContainersAndServices: error querying for services: Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager." Affected code:

docker-volume-backup/cmd/backup/stop_restart.go

Lines 144 to 148 in f9eabbc

if isDockerSwarm { allServices, err = s.cli.ServiceList(context.Background(), types.ServiceListOptions{}) if err != nil { return noop, errwrap.Wrap(err, "error querying for services") } Reference from Docker Swarm docs: https://docs.docker.com/reference/cli/docker/service/ls/

This is a cluster management command, and must be executed on a swarm manager node. To learn about managers and workers, refer to the Swarm mode section in the documentation.

@matzeeable This behavior os now fixed in v2.43.4 where containers that are deployed to worker nodes will not try to use swarm features anymore.

m90 avatar Jun 09 '25 12:06 m90