community.docker icon indicating copy to clipboard operation
community.docker copied to clipboard

[community.docker.docker_swarm_service] Providing a command results in deployment failure

Open Lebowski89 opened this issue 10 months ago • 2 comments

Hello,

I am currently in the process of moving from local Docker containers (deployed using the community.docker.docker_container module) to Docker Swarm - using the community.docker.docker_swarm_service module. This has been working well until any service needs a command.

For example - this will work:

- name: Create portainer service
  community.docker.docker_swarm_service:
    name: '{{ portainer_defaults_name }}'
    image: '{{ portainer_defaults_image_repo }}:{{ portainer_defaults_image_tag }}'
    networks:
      - name: '{{ network_overlay }}'
    env:
      PUID: '{{ puid }}'
      PGID: '{{ pgid }}'
      TZ: '{{ timezone }}'
    labels: '{{ portainer_defaults_labels }}'
    publish:
      - target_port: '{{ portainer_defaults_ports_http_cont }}'
        published_port: '{{ portainer_defaults_ports_http_host }}'
        protocol: tcp
        mode: ingress
      - target_port: '{{ portainer_defaults_ports_tcp_tunnel_cont }}'
        published_port: '{{ portainer_defaults_ports_tcp_tunnel_host }}'
        protocol: tcp
        mode: ingress
      - target_port: '{{ portainer_defaults_ports_webui_cont }}'
        published_port: '{{ portainer_defaults_ports_webui_host }}'
        protocol: tcp
        mode: ingress
    mounts:
      - source: '{{ portainer_defaults_location }}'
        target: /data
        type: bind
    restart_config:
      condition: '{{ portainer_defaults_restart_policy }}'
      delay: 5s
      max_attempts: 3
      window: 120s
    mode: replicated
    replicas: 1
    placement:
      constraints: [node.role == manager]

Now, if I include the required command to connect to the Portainer agent:

- name: Create portainer service
  community.docker.docker_swarm_service:
    name: '{{ portainer_defaults_name }}'
    image: '{{ portainer_defaults_image_repo }}:{{ portainer_defaults_image_tag }}'
    networks:
      - name: '{{ network_overlay }}'
    command: '-H tcp://tasks.{{ portainer_agent_defaults_name }}:9001 --tlsskipverify'
    env:
      PUID: '{{ puid }}'
      PGID: '{{ pgid }}'
      TZ: '{{ timezone }}'
    labels: '{{ portainer_defaults_labels }}'
    publish:
      - target_port: '{{ portainer_defaults_ports_http_cont }}'
        published_port: '{{ portainer_defaults_ports_http_host }}'
        protocol: tcp
        mode: ingress
      - target_port: '{{ portainer_defaults_ports_tcp_tunnel_cont }}'
        published_port: '{{ portainer_defaults_ports_tcp_tunnel_host }}'
        protocol: tcp
        mode: ingress
      - target_port: '{{ portainer_defaults_ports_webui_cont }}'
        published_port: '{{ portainer_defaults_ports_webui_host }}'
        protocol: tcp
        mode: ingress
    mounts:
      - source: '{{ portainer_defaults_location }}'
        target: /data
        type: bind
    restart_config:
      condition: '{{ portainer_defaults_restart_policy }}'
      delay: 5s
      max_attempts: 3
      window: 120s
    mode: replicated
    replicas: 1
    placement:
      constraints: [node.role == manager]

Portainer will not successfully deploy, and upon checking with ‘docker ps -a’, it will show portainer listed 4 times with the status of ‘created’. I have also tried it without quotes and also as a list for each part of that command - all without success.

This is not a Portainer issue. It will happen to any service if I input anything into the command section. Strangely, this was never an issue with the container module, I'm only running into this with the swarm services module.

For example, my container deployment has always successfully deployed with a command:

- name: Create portainer container
  community.docker.docker_container:
    name: '{{ portainer_defaults_name }}'
    image: '{{ portainer_defaults_image_repo }}:{{ portainer_defaults_image_tag }}'
    networks:
      - name: '{{ network_backend }}'
      - name: '{{ traefik_network }}'
    command: '-H {{ socket_proxy_endpoint }}'
    env:
      PUID: '{{ puid }}'
      PGID: '{{ pgid }}'
      TZ: '{{ timezone }}'
    labels: '{{ portainer_defaults_labels }}'
    ports:
      - '{{ portainer_defaults_ports_http_host }}:{{ portainer_defaults_ports_http_cont }}'
      - '{{ portainer_defaults_ports_tcp_tunnel_host }}:{{ portainer_defaults_ports_tcp_tunnel_cont }}'
      - '{{ portainer_defaults_ports_webui_host }}:{{ portainer_defaults_ports_webui_cont }}'
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - '{{ portainer_defaults_location }}:/data'
    restart_policy: '{{ portainer_defaults_restart_policy }}'

Additionally, if I use that command in a docker services create terminal command:

docker service create \
  --name portainer \
  --replicas 1 \
  --constraint node.role==manager \
  --network overlay \
  --env PUID=1000 \
  --env PGID=1000 \
  --env TZ=Australia/Melbourne \
  --publish published=9000,target=9000,protocol=tcp,mode=ingress \
  --publish published=8000,target=8000,protocol=tcp,mode=ingress \
  --publish published=9443,target=9443,protocol=tcp,mode=ingress \
  --mount type=bind,source=/opt/portainer,destination=/data \
  --restart-condition on-failure \
  portainer/portainer-ce \
  -H tcp://tasks.agent:9001 --tlsskipverify

It will work just fine.

OS Version/build

Client: Docker Engine - Community
Version: 27.5.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.20.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.32.4
Path: /usr/libexec/docker/cli-plugins/docker-compose
Swarm: active
Is Manager: true
Kernel Version: 6.1.0-28-amd64
Operating System: Debian GNU/Linux 12 (bookworm)
OSType: linux
Architecture: x86_64

ansible [core 2.17.8]
config file = /etc/ansible/ansible.cfg
configured module search path = [‘/root/.ansible/plugins/modules’, ‘/usr/share/ansible/plugins/modules’]
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True

Any ideas of what is going on or what I'm doing wrong? Thanks

Lebowski89 avatar Feb 13 '25 13:02 Lebowski89

You might want to inspect the created containers, maybe that gives you a hint on what might have went wrong.

felixfontein avatar Feb 13 '25 17:02 felixfontein

Thanks for reply.

Here is an example of trying to deploy 'Loki' with a command:

- name: Create loki service
  community.docker.docker_swarm_service:
    name: '{{ loki_defaults_name }}'
    image: '{{ loki_defaults_image_repo }}:{{ loki_defaults_image_tag }}'
    networks:
      - name: '{{ network_overlay }}'
    command:
      - '-config.file=/etc/loki/loki-config.yaml'
    env:
      PUID: '{{ puid }}'
      PGID: '{{ pgid }}'
      TZ: '{{ timezone }}'
    publish:
      - target_port: '{{ loki_defaults_ports_cont }}'
        published_port: '{{ loki_defaults_ports_host }}'
        protocol: tcp
        mode: ingress
    mounts:
      - source: '{{ loki_defaults_volume }}'
        target: /etc/loki
        type: volume
    restart_config:
      condition: '{{ loki_defaults_restart_policy }}'
      delay: 5s
      max_attempts: 3
      window: 120s
    mode: replicated
    replicas: 1
    placement:
      constraints: [node.role == manager]

What it looks like in Portainer afterwards:

Image Image

Results of Inspect command:

[
    {
        "ID": "ga9awgt9dempy8rmwhkfeulx4",
        "Version": {
            "Index": 2109
        },
        "CreatedAt": "2025-02-14T14:05:51.881671346Z",
        "UpdatedAt": "2025-02-14T14:05:51.883427601Z",
        "Spec": {
            "Name": "loki",
            "Labels": {},
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "grafana/loki:main",
                    "Command": [
                        "-config.file=/etc/loki/loki-config.yaml"
                    ],
                    "Env": [
                        "PGID=1000",
                        "PUID=1000",
                        "TZ=Australia/Melbourne"
                    ],
                    "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "loki_nfs",
                            "Target": "/etc/loki"
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {},
                    "Isolation": "default"
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "on-failure",
                    "Delay": 5000000000,
                    "MaxAttempts": 3,
                    "Window": 120000000000
                },
                "Placement": {
                    "Constraints": [
                        "node.role == manager"
                    ]
                },
                "Networks": [
                    {
                        "Target": "t14dfx3re3x4uqdsu6ot7zd5k"
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 3100,
                        "PublishedPort": 3100,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 3100,
                        "PublishedPort": 3100,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 3100,
                    "PublishedPort": 3100,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "pld3g652yrkst88gn524orqfa",
                    "Addr": "10.0.0.57/24"
                },
                {
                    "NetworkID": "t14dfx3re3x4uqdsu6ot7zd5k",
                    "Addr": "172.98.0.94/24"
                }
            ]
        }
    }
]

Logs show nothing. 'No log line matching the '' filter'.

Command is the exact same one I used with the Container module.

Lebowski89 avatar Feb 14 '25 14:02 Lebowski89

it doesn't work:

- community.docker.docker_swarm_service:
    name: "prometheus"
    image: "{{ registry_url }}/prom/prometheus:{{ prometheus_image_tag }}"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--web.route-prefix=/prometheus/"
      - "--web.external-url=/prometheus/"
    replicas: 1
    placement:
      constraints:
        - node.labels.monitoring == true
    networks:
      - "traefik_net"
    labels: "{{ service_labels }}"
    force_update: true

it works:

- ansible.builtin.shell: | 
    docker service create \
    --name prometheus \
    --replicas 1 \
    --constraint "node.labels.monitoring == true" \
    --network traefik_net \
    --label traefik.enable="true" \
    --label traefik.http.routers.prometheus.rule="PathPrefix(\"/prometheus\")" \
    --label traefik.http.routers.prometheus.entrypoints="http" \
    --label traefik.http.services.prometheus.loadbalancer.server.port="9090" \
    --label traefik.docker.network="traefik_net" \
    prom/prometheus:v3.4.2 \
    --config.file=/etc/prometheus/prometheus.yml \
    --web.route-prefix=/prometheus/ \
    --web.external-url=/prometheus/

The crash occurs when using "command".

backender76 avatar Jun 29 '25 19:06 backender76