spring-cloud-dataflow icon indicating copy to clipboard operation
spring-cloud-dataflow copied to clipboard

Clean up all task/job executions does not clean up tasks on unknown state

Open juanpablo-santos opened this issue 9 months ago • 4 comments
trafficstars

Description: Not sure if bug or improvement request. Currently, the "Clean up all task/job executions" menu option at Tools section requires that the pods associated to the execution to be present at the cluster where te were run in order to remove the task execution from SCDF database.

In our case, our platform team runs a pipeline on every k8s cluster which wipes every pod that has been on finished/err state for more than 6 hours, so when we run the 'Clean up all task/job executions', every execution that hasn't its pod present on the cluster doesn't get deleted. The cleanup process tries to fetch the pod (to delete it, I presume), raises an exception that appears on the SCDF server logs, and then carries on and tries with the next execution.

Current workaround is to manually delete the rows at dabase level.

Release versions:

{
  "versions": {
    "implementation": {
      "name": "spring-cloud-dataflow-server",
      "version": "2.11.5"
    },
    "core": {
      "name": "Spring Cloud Data Flow Core",
      "version": "2.11.5"
    },
    "dashboard": {
      "name": "Spring Cloud Dataflow UI",
      "version": "3.4.6"
    },
    "shell": {
      "name": "Spring Cloud Data Flow Shell",
      "version": "2.11.5",
      "url": "https://repo.maven.apache.org/maven2/org/springframework/cloud/spring-cloud-dataflow-shell/2.11.5/spring-cloud-dataflow-shell-2.11.5.jar"
    }
  },
  "features": {
    "streams": true,
    "tasks": true,
    "schedules": true,
    "monitoringDashboardType": "GRAFANA"
  },
  "runtimeEnvironment": {
    "appDeployer": {
      "deployerImplementationVersion": "2.11.5",
      "deployerName": "Spring Cloud Skipper Server",
      "deployerSpiVersion": "2.11.5",
      "javaVersion": "21.0.5",
      "platformApiVersion": "",
      "platformClientVersion": "",
      "platformHostVersion": "",
      "platformSpecificInfo": {
        "default": "kubernetes"
      },
      "platformType": "Skipper Managed",
      "springBootVersion": "2.7.18",
      "springVersion": "5.3.39"
    },
    "taskLaunchers": [
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s/"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-n666tnnf"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-ghbjhsss"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      }
    ]
  },
  "monitoringDashboardInfo": {
    "url": "https://grafana.sanitas.dom",
    "source": "default-scdf-source",
    "refreshInterval": 15
  },
  "security": {
    "isAuthentication": true,
    "isAuthenticated": true,
    "username": "jpsantos",
    "roles": [
      "ROLE_CREATE",
      "ROLE_DEPLOY",
      "ROLE_DESTROY",
      "ROLE_MANAGE",
      "ROLE_MODIFY",
      "ROLE_SCHEDULE",
      "ROLE_VIEW"
    ]
  },
  "git": {
    "commit": "edc71ff"
  }
}

Custom apps: N/A.

Steps to reproduce: N/A.

Screenshots: N/A.

Additional context: N/A.

juanpablo-santos avatar Feb 19 '25 12:02 juanpablo-santos

Hello @juanpablo-santos , Can you share the stack trace? Thanks

cppwfs avatar Feb 19 '25 15:02 cppwfs

Hi,

my bad, not an stacktrace but a warn message on log like

2025-02-19 19:19:12.282  WARN 1 --- [nio-8080-exec-1] o.s.c.d.s.k.KubernetesTaskLauncher       : Cannot delete pod for task "TASK_NAME_HERE-xexqgrpd6e" (reason: pod does not exist)

per task execution without its corresponding pod

juanpablo-santos avatar Feb 19 '25 18:02 juanpablo-santos

ouch, not exactly what I reported. Most executions get deleted.

However, those executions that were unable to spin up a pod because of whatever reason (in our case, f.ex., a missing init container) are not deleted. Our first couple of executions pages look something like this:

Image

I'm so used to see it that thought that expected the Clean up to wipe it, and incorrectly thought that it wasn't deleting executions at all, but it is deleting executions with FAILED or SUCCESS state. Apologies on the noise, will update the issue title accordingly.

juanpablo-santos avatar Feb 19 '25 18:02 juanpablo-santos

Support option that will delete status of UNKNOWN. But keeping mind it could delete pending task runs. So that will need to be documented.

cppwfs avatar Feb 19 '25 18:02 cppwfs