kestra icon indicating copy to clipboard operation
kestra copied to clipboard

Every script execution fails with DockerTaskRunner on Windows

Open gitmonster opened this issue 10 months ago • 36 comments

Describe the issue

I try to test kestra and every single script invocation fails. It seems the script itself is not accessible from the docker runner container. Please see the

logs

image

The same problem occurs with scripts that access namespace files.

the flow in question

id: "script_in_venv"
namespace: "myteam"
tasks:
  - id: bash
    type: io.kestra.plugin.scripts.python.Commands
    inputFiles:
      main.py: |
        import requests
        from kestra import Kestra

        response = requests.get('https://google.com')
        print(response.status_code)
        Kestra.outputs({'status': response.status_code, 'text': response.text})
    beforeCommands:
      - python -m venv venv
      - . venv/bin/activate
      - pip install requests kestra > /dev/null
    commands:
      - python main.py

docker-compose.yml

volumes:
  postgres-data:
    driver: local  
  kestra-data:
    driver: local

services:
  postgres:
    image: postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: kestra
      POSTGRES_USER: kestra
      POSTGRES_PASSWORD: k3str4
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 30s
      timeout: 10s
      retries: 10

  kestra:
    image: kestra/kestra:v0.16.1-full
    pull_policy: always
    # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
    user: "root"
    command: server standalone --worker-thread=128
    volumes:
      - kestra-data:/app/storage
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/kestra-wd:/tmp/kestra-wd
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://postgres:5432/kestra
            driverClassName: org.postgresql.Driver
            username: kestra
            password: k3str4
        kestra:
          plugins:
              configurations: 
              - type: io.kestra.plugin.scripts.runner.docker.DockerTaskRunner
                values: 
                  volume-enabled: true
          server:
            basic-auth:
              enabled: false
              username: "[email protected]" # it must be a valid email address
              password: kestra
          repository:
            type: postgres
          storage:
            type: local
            local:
              base-path: "/app/storage"
          queue:
            type: postgres
          tasks:          
            tmp-dir:
              path: /tmp/kestra-wd/tmp                    
          url: http://localhost:8080/
    ports:
      - "8080:8080"
      - "8081:8081"
    depends_on:
      postgres:
        condition: service_started

Environment

  • Kestra Version: v0.16.1-full
  • Operating System (OS/Docker/Kubernetes): Docker on WSL 2
  • Java Version (if you don't run kestra in Docker):

gitmonster avatar Apr 21 '24 13:04 gitmonster

interesting, I couldn't reproduce on the latest version:

image

Can you try using kestra/kestra:latest-full:

docker run --pull=always --rm -it -p 28080:8080 --user=root -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp kestra/kestra:latest-full server local

It could be some Docker issue. On Windows, it might be worth trying using Docker Desktop instead of WSL

image

image

anna-geller avatar Apr 21 '24 22:04 anna-geller

Hi @anna-geller , thanks. I used the latest version with same results. Then I tried 0.16.1, then the dev-snapshot, every time with the same issue. Btw. I use docker-desktop with WSL2 and docker compose. So I will try your direct run command.

gitmonster avatar Apr 22 '24 09:04 gitmonster

Sorry, but starting kestra without custom docker-compose config and

docker run --pull=always --rm -it -p 28080:8080 --user=root -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp kestra/kestra:latest-full server local

and using the test flow from above offers the same failing result. image

gitmonster avatar Apr 22 '24 09:04 gitmonster

I tried again and removed the /tmp mount completely, thinking it was a permissions issue. Without success. Now I'm going to deploy Kestra in a live environment and see if I can do it.

gitmonster avatar Apr 22 '24 12:04 gitmonster

thx so much for reporting, my colleague could reproduce using the same setup as you did image

so you didn't do anything wrong. We'll look at it in more detail and keep you updated on the issue

anna-geller avatar Apr 22 '24 15:04 anna-geller

@paulgrainger85 I think it's not WSL related since I deployed with the above settings in our companys docker infra and had the same issues. All works well but script flows.

gitmonster avatar Apr 22 '24 17:04 gitmonster

@gitmonster, on your infra is linux based? you can share how it's deployed? (docker compose or anything else)

tchiotludo avatar Apr 22 '24 20:04 tchiotludo

Hey @tchiotludo, yes it's

  • linux based on ubuntu-server:20.04
  • swarm environment
  • deployed with docker compose
  • kestra version is pinned to 0.16.1

here is the config

services:
####################################################
  postgres:
      image: postgres:16.2
      deploy:
        mode: replicated
        replicas: 1     
        placement:
          constraints: [ node.labels.name == ${HOST_NAME} ]
      volumes:
        - postgres-data:/var/lib/postgresql/data
      networks:      
        - net
      environment:
        POSTGRES_DB: ${KESTRA_POSTGRES_DB}
        POSTGRES_USER: ${KESTRA_POSTGRES_USER}
        POSTGRES_PASSWORD: ${KESTRA_POSTGRES_PASSWORD}
      healthcheck:
        test: ["CMD-SHELL", "pg_isready -d $${KESTRA_POSTGRES_DB} -U $${KESTRA_POSTGRES_USER}"]
        interval: 30s
        timeout: 10s
        retries: 10
  
  ####################################################
    kestra:
      image: ${IMAGE_NAME_KESTRA_WITH_CONFIG}  
      deploy:
        mode: replicated
        replicas: 1     
        placement:
          constraints: [ node.labels.name == ${HOST_NAME} ]
        labels:
          - traefik.enable=true
          - traefik.docker.network=public
          - traefik.http.routers.kestra.rule=Host(`${KESTRA_DOMAIN}`)        
          - traefik.http.routers.kestra.entrypoints=websecure
          - traefik.http.routers.kestra.tls.certresolver=le
          - traefik.http.routers.kestra.service=kestra       
          - traefik.http.services.kestra.loadbalancer.server.port=${KESTRA_PORT}       
          - traefik.http.routers.kestra.middlewares=authelia@docker 
          - traefik.constraint-label=traefik-public         
      # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
      user: "root"
      command: server standalone --worker-thread=128
      networks:
        - public
        - base_net
        - net
      volumes:
        - kestra-data:/app/storage
        - /var/run/docker.sock:/var/run/docker.sock
        - /tmp:/tmp/kestra-wd
      environment:
        KESTRA_CONFIGURATION: |
          datasources:
            postgres:
              url: jdbc:postgresql://postgres:5432/${KESTRA_POSTGRES_DB}
              driverClassName: org.postgresql.Driver
              username: ${KESTRA_POSTGRES_USER}
              password: ${KESTRA_POSTGRES_PASSWORD}
          kestra:          
            repository:
              type: postgres
            storage:
              type: local
              local:
                base-path: "/app/storage"
            queue:
              type: postgres          
            url: ${KESTRA_PUBLIC_URL}    

####################################################
networks:
  base_net:
    external: true
  public:
    external: true
  net:
    driver: overlay

####################################################
volumes:
  postgres-data:
    driver: local  
  kestra-data:
    driver: local

gitmonster avatar Apr 23 '24 08:04 gitmonster

@gitmonster : swarm in multiple node?

tchiotludo avatar Apr 23 '24 08:04 tchiotludo

yes, but as you can see, all kestra related deployment is constrained to a single host.

gitmonster avatar Apr 23 '24 08:04 gitmonster

So it's not only on Windows as I was told.

Can you try with something very simple like a Shell task that try to write in a file?

id: old-shell
namespace: myteam

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    commands:
      - echo "Hello" > hello.txt
      - cat hello.txt

loicmathieu avatar Apr 23 '24 09:04 loicmathieu

No, It's not windows-only related. As you can see I started out on ubuntu:20.04/windows/WSL2 then deployed to a pure linux environment. This is working: image

gitmonster avatar Apr 23 '24 09:04 gitmonster

And if you call pwd the working directory is correctly created?

id: old-shell
namespace: myteam

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    commands:
      - pwd

loicmathieu avatar Apr 23 '24 10:04 loicmathieu

yes. image

gitmonster avatar Apr 23 '24 10:04 gitmonster

I tried on Docker Swarm and it works, it should be some permission issue.

I notice that you mount /tmp:/tmp/kestra-wd mounting /tmp can be a bit blunt as there can be a lot of things trying to write on it.

Can you mount /tmp/kestra-wd:/tmp/kestra-wd instead and create the dir prior to launching the stack to be sure it exists.

loicmathieu avatar Apr 23 '24 14:04 loicmathieu

@loicmathieu I tried with /tmp/kestra-wd:/tmp/kestra-wd settings yesterday but couldn't deploy the entire service cause the /tmp/kestra-wd folder could not be created on the host.

Now I redeployed with /tmp/kestra-wd:/tmp/kestra-wd successfully but the

id: "script_in_venv"
namespace: "myteam"
tasks:
  - id: bash
    type: io.kestra.plugin.scripts.python.Commands
    inputFiles:
      main.py: |
        import requests
        from kestra import Kestra

        response = requests.get('https://google.com')
        print(response.status_code)
        Kestra.outputs({'status': response.status_code, 'text': response.text})
    beforeCommands:
      - python -m venv venv
      - . venv/bin/activate
      - pip install requests kestra > /dev/null
    commands:
      - python main.py

flow gives the same failing result: image

gitmonster avatar Apr 23 '24 15:04 gitmonster

with /tmp/kestra-wd existing on the host: image

gitmonster avatar Apr 23 '24 15:04 gitmonster

The /tmp/5aAT9X6DiSJDb26ugbZQ1v folder from the last invocation exists on the host, but /tmp/5aAT9X6DiSJDb26ugbZQ1v/main.py not:

image

gitmonster avatar Apr 23 '24 15:04 gitmonster

The same here. It runs nicely but if you change the runner from the second task to DOCKER, the test.go file is missing in the workspace folder:

id: golangtest
namespace: myteam
description: Test golang scripts

inputs:
  - id: greeting
    type: STRING
    defaults: "kestra.io from inputs"

tasks:
  - id: retrieve_go_version
    type: io.kestra.plugin.scripts.shell.Commands
    docker:
      image: golang:alpine3.19
    runner: DOCKER
    commands:
      - go version

  - id: test_go_script_simple
    type: io.kestra.plugin.scripts.shell.Commands
    docker:
      image: golang:alpine3.19
    runner: PROCESS
    warningOnStdErr: false
    inputFiles:
      test.go: |
        package main
        
        import(
          "fmt"
          "github.com/fatih/color"
        ) 
        
        func main(){
          fmt.Printf("hello %s\n", "{{ inputs.greeting }}")
          color.Blue("Hey %s from golang", "kestra")
        }    
    beforeCommands:
      - go mod init github.com/kestra/test
      - go mod tidy 
    commands:
      - pwd
      - ls -la .
      - go run test.go  

runner: PROCESS image

runner: DOCKER image

gitmonster avatar Apr 23 '24 18:04 gitmonster

@gitmonster thanks for the additional feedback, the issue seems to be that the Docker runner is unable to create the input files in the temporary folder. Now that we narrow that down I need to reproduce it using a Windows box to see if we can fix that.

loicmathieu avatar Apr 24 '24 07:04 loicmathieu

@loicmathieu all the tests I did lastly were not on windows but on ubuntu:20.04 server -> this error is not windows related, so the title you gave to this issue is somewhat missleading.

gitmonster avatar Apr 24 '24 07:04 gitmonster

@gitmonster I understood that, but unless you can give me the steps to setup the same env you use (with Docker Swarm if I understand it), the only easy way to reproduce is to use a Windows box. We cannot reproduce it in any of our Linux environments and I cannot reproduce it locally with Docker Swarm so this must be linked to your exact setup.

loicmathieu avatar Apr 24 '24 07:04 loicmathieu

@loicmathieu understand, this is the config I used https://github.com/kestra-io/kestra/issues/3590#issuecomment-2071712327 I will try a fresh local install on another machine with this config

gitmonster avatar Apr 24 '24 08:04 gitmonster

With no luck. Started the default docker-compose.yml from the repo with a simple docker compose up and tested the flow. I'll give up on this.

image

Here my docker version:

image

gitmonster avatar Apr 24 '24 12:04 gitmonster

@gitmonster on which OS exactly? You originally reported it on Windows with WSL, and we are able to confirm it didn't work on Windows,. If it also occurs on a Linux machine, please run uname -a so we know the exact distribution and version it didn't work.

loicmathieu avatar Apr 24 '24 12:04 loicmathieu

No, currently it's another win10/wsl2 machine: Linux 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux But the same happens to me on a pure ubuntu:20.04: Linux 5.4.0-166-generic #183-Ubuntu SMP Mon Oct 2 11:28:33 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

gitmonster avatar Apr 24 '24 13:04 gitmonster

Linux manager01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Same issue with docker swarm.

version: '3.8'

x-default-opts:
  &default-opts
  logging:
    options:
      max-size: "10m"

networks:
  traefik-proxy:
    external: true

services:
  app:
    image: kestra/kestra:latest-full
    # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
    networks:
      - traefik-proxy
    user: "root"  
    command: server standalone --worker-thread=128
    volumes:
      - /mnt/storage-pool/appdata/kestra/kestra-data:/app/storage
      - /var/run/docker.sock:/var/run/docker.sock
      - /mnt/storage-pool/appdata/kestra/tmp/kestra-wd:/tmp/kestra-wd
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://data01.xxxx.net/kestra_db
            driverClassName: org.postgresql.Driver
            username: kestradb_user
            password: xxxxxxx
        kestra:
          repository:
            type: postgres
          storage:
            type: local
            local:
              base-path: "/app/storage"
          queue:
            type: postgres
          tasks:          
            tmp-dir:
              path: /tmp/kestra-wd/tmp  
          url: https://kestra.xxxxxx.net/
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
           - node.role == worker
      # Container resources (replace with yours)
      resources:
        limits:
          cpus: '1.55'
          memory: 2G
        reservations:
          cpus: '1.35'
          memory: 512M
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.kestra.rule=Host(`kestra.xxxx.net`)"
        - "traefik.http.routers.kestra.service=kestra"
        - "traefik.http.routers.kestra.entrypoints=https"
        - "traefik.http.services.kestra.loadbalancer.server.port=8080"
        - "traefik.http.routers.kestra.tls=true"
        #- "traefik.http.services.kestra.loadbalancer.passhostheader=true"
        - "traefik.http.routers.kestra.middlewares=chain-authentik@file"       

image image

mrkhachaturov avatar Apr 24 '24 13:04 mrkhachaturov

Hi @gitmonster

Can you try using a different image than the default one:

id: "script_in_venv"
namespace: "myteam"
tasks:
  - id: bash
    type: io.kestra.plugin.scripts.python.Commands
    docker:
      image: python
    inputFiles:
      main.py: |
        import requests
        from kestra import Kestra

        response = requests.get('https://google.com')
        print(response.status_code)
        Kestra.outputs({'status': response.status_code, 'text': response.text})
    beforeCommands:
      - python -m venv venv
      - . venv/bin/activate
      - pip install requests kestra > /dev/null
    commands:
      - python main.py

loicmathieu avatar Apr 25 '24 07:04 loicmathieu

@loicmathieu here the result: Screenshot 2024-04-25 100316

gitmonster avatar Apr 25 '24 08:04 gitmonster

Can you try:

services:
  app:
    # ... 
    volumes:
      - /mnt/storage-pool/appdata/kestra/kestra-data:/app/storage
      - /var/run/docker.sock:/var/run/docker.sock
      - /mnt/storage-pool/appdata/kestra/tmp/kestra-wd:/mnt/storage-pool/appdata/kestra/tmp/kestra-wd
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
        # ... 
        kestra:
          # ... 
          tasks:          
            tmp-dir:
              path: /mnt/storage-pool/appdata/kestra/tmp/kestra-wd/tmp  
          url: https://kestra.xxxxxx.net/
    deploy:

tchiotludo avatar Apr 25 '24 08:04 tchiotludo