argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

UI does not show all running workflows (it used to do so) / UI and argo list differ (missing running workflows)

Open scravy opened this issue 1 year ago • 14 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issues exists when I tested with :latest
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

When I am submitting a workflow in the UI it is shown in the list of running workflows. When refreshing this page all workflows should be listed.

We just upgraded to argo 3.4.0 and not all running workflows are shown in the UI. When you wait long enough eventually workflows pop up. In our day to day work we noticed that workflows seem to be added to the UI once a transition happens, i.e. when a new step is entered in the workflow.

In short: The output of argo list and the workflows shown in the UI are not the same. Some running flows are missing. They are being added once they transition from one node to another while the UI is shown.

Version

3.4.0

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: zargo-debug-child
  namespace: argo
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: node1
            template: work
        - - name: node2
            template: work
    - name: work
      script:
        image: bash:5.2.0-alpine3.15
        command:
          - bash
        source: |
          echo "Doing some work for 60 seconds"
          sleep 60

Logs from the workflow controller

time="2022-09-27T15:57:49.577Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.591Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.593Z" level=info msg="Steps node zargo-debug-child-vcmgq initialized Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.593Z" level=info msg="StepGroup node zargo-debug-child-vcmgq-3865638504 initialized Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.593Z" level=info msg="Pod node zargo-debug-child-vcmgq-3922648685 initialized Pending" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.628Z" level=info msg="Created pod: zargo-debug-child-vcmgq[0].node1 (zargo-debug-child-vcmgq-work-3922648685)" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.628Z" level=info msg="Workflow step group node zargo-debug-child-vcmgq-3865638504 not yet completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.628Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.628Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:49.639Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10435392 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg="node changed" namespace=argo new.message= new.phase=Running new.progress=0/1 nodeID=zargo-debug-child-vcmgq-3922648685 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg="Workflow step group node zargo-debug-child-vcmgq-3865638504 not yet completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.629Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:57:59.726Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10435445 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-vcmgq-3922648685 old.message= old.phase=Running old.progress=0/1 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="Step group node zargo-debug-child-vcmgq-3865638504 successful" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="node zargo-debug-child-vcmgq-3865638504 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.782Z" level=info msg="node zargo-debug-child-vcmgq-3865638504 finished: 2022-09-27 15:59:01.78294768 +0000 UTC" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.787Z" level=info msg="StepGroup node zargo-debug-child-vcmgq-2858834269 initialized Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.787Z" level=info msg="SG Outbound nodes of zargo-debug-child-vcmgq-3922648685 are [zargo-debug-child-vcmgq-3922648685]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.788Z" level=info msg="Pod node zargo-debug-child-vcmgq-3116902745 initialized Pending" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.818Z" level=info msg="Created pod: zargo-debug-child-vcmgq[1].node2 (zargo-debug-child-vcmgq-work-3116902745)" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.818Z" level=info msg="Workflow step group node zargo-debug-child-vcmgq-2858834269 not yet completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.818Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.818Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.828Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10435675 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:01.834Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-vcmgq-work-3922648685/labelPodCompleted
time="2022-09-27T15:59:11.820Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.820Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.820Z" level=info msg="node changed" namespace=argo new.message= new.phase=Running new.progress=0/1 nodeID=zargo-debug-child-vcmgq-3116902745 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.820Z" level=info msg="SG Outbound nodes of zargo-debug-child-vcmgq-3922648685 are [zargo-debug-child-vcmgq-3922648685]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.821Z" level=info msg="Workflow step group node zargo-debug-child-vcmgq-2858834269 not yet completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.821Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.821Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T15:59:11.834Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10435730 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-vcmgq-3116902745 old.message= old.phase=Running old.progress=0/1 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="SG Outbound nodes of zargo-debug-child-vcmgq-3922648685 are [zargo-debug-child-vcmgq-3922648685]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Step group node zargo-debug-child-vcmgq-2858834269 successful" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="node zargo-debug-child-vcmgq-2858834269 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="node zargo-debug-child-vcmgq-2858834269 finished: 2022-09-27 16:00:13.417764045 +0000 UTC" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Outbound nodes of zargo-debug-child-vcmgq-3116902745 is [zargo-debug-child-vcmgq-3116902745]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Outbound nodes of zargo-debug-child-vcmgq is [zargo-debug-child-vcmgq-3116902745]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="node zargo-debug-child-vcmgq phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="node zargo-debug-child-vcmgq finished: 2022-09-27 16:00:13.41780197 +0000 UTC" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Checking daemoned children of zargo-debug-child-vcmgq" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.417Z" level=info msg="Running OnExit handler: b72ca92baf93-exit-handler" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.419Z" level=info msg="Steps node zargo-debug-child-vcmgq-4002967377 initialized Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.419Z" level=info msg="StepGroup node zargo-debug-child-vcmgq-3620866933 initialized Running" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.419Z" level=info msg="Pod node zargo-debug-child-vcmgq-1020052388 initialized Pending" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.438Z" level=info msg="Created pod: zargo-debug-child-vcmgq.onExit[0].success (zargo-debug-child-vcmgq-b72ca92baf93-send-slack-1020052388)" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.438Z" level=info msg="Skipping zargo-debug-child-vcmgq.onExit[0].failure: when 'Succeeded != Succeeded' evaluated false" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.438Z" level=info msg="Skipped node zargo-debug-child-vcmgq-1275290133 initialized Skipped (message: when 'Succeeded != Succeeded' evaluated false)" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.438Z" level=info msg="Workflow step group node zargo-debug-child-vcmgq-3620866933 not yet completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.448Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10436003 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:13.455Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-vcmgq-work-3116902745/labelPodCompleted
time="2022-09-27T16:00:23.440Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.440Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.440Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.440Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-vcmgq-1020052388 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Running OnExit handler: b72ca92baf93-exit-handler" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Step group node zargo-debug-child-vcmgq-3620866933 successful" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="node zargo-debug-child-vcmgq-3620866933 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="node zargo-debug-child-vcmgq-3620866933 finished: 2022-09-27 16:00:23.441636451 +0000 UTC" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Outbound nodes of zargo-debug-child-vcmgq-1020052388 is [zargo-debug-child-vcmgq-1020052388]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Outbound nodes of zargo-debug-child-vcmgq-1275290133 is [zargo-debug-child-vcmgq-1275290133]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Outbound nodes of zargo-debug-child-vcmgq-4002967377 is [zargo-debug-child-vcmgq-1020052388 zargo-debug-child-vcmgq-1275290133]" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="node zargo-debug-child-vcmgq-4002967377 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="node zargo-debug-child-vcmgq-4002967377 finished: 2022-09-27 16:00:23.44170012 +0000 UTC" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Checking daemoned children of zargo-debug-child-vcmgq-4002967377" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Marking workflow completed" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Marking workflow as pending archiving" namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.441Z" level=info msg="Checking daemoned children of " namespace=argo workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.447Z" level=info msg="cleaning up pod" action=deletePod key=argo/zargo-debug-child-vcmgq-1340600742-agent/deletePod
time="2022-09-27T16:00:23.462Z" level=info msg="Workflow update successful" namespace=argo phase=Succeeded resourceVersion=10436076 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.646Z" level=info msg="archiving workflow" namespace=argo uid=cdd6c373-96d9-4c98-b894-3ef1f02b18a2 workflow=zargo-debug-child-vcmgq
time="2022-09-27T16:00:23.650Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-vcmgq-b72ca92baf93-send-slack-1020052388/labelPodCompleted
time="2022-09-27T16:00:23.669Z" level=info msg="Queueing Succeeded workflow argo/zargo-debug-child-vcmgq for delete in 18h0m0s due to TTL"

Logs from in your workflow's wait container

time="2022-09-27T15:58:52.538Z" level=info msg="Creating minio client using static credentials" endpoint=s3.amazonaws.com
time="2022-09-27T15:58:52.544Z" level=info msg="Saving file to s3" bucket=incrmntal-argo-artifactory endpoint=s3.amazonaws.com key=incrmntal-prod-two/argo/zargo-debug-child-vcmgq/2022-09-27-zargo-debug-child-vcmgq-work-3922648685/main.log path=/tmp/argo/outputs/logs/main.log
time="2022-09-27T15:58:52.587Z" level=info msg="Save artifact" artifactName=main-logs duration=49.392744ms error="<nil>" key=incrmntal-prod-two/argo/zargo-debug-child-vcmgq/2022-09-27-zargo-debug-child-vcmgq-work-3922648685/main.log
time="2022-09-27T15:58:52.587Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-09-27T15:58:52.587Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-09-27T15:58:52.599Z" level=info msg="Create workflowtaskresults 403"
time="2022-09-27T15:58:52.599Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2022-09-27T15:58:52.618Z" level=info msg="Patch pods 200"
time="2022-09-27T15:58:52.621Z" level=info msg="Deadline monitor stopped"
time="2022-09-27T15:58:52.621Z" level=info msg="Alloc=7002 TotalAlloc=16520 Sys=24018 NumGC=5 Goroutines=9"

scravy avatar Sep 27 '22 16:09 scravy

time="2022-09-27T15:58:52.599Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""

You are missing some permissions on your service account.

terrytangyuan avatar Sep 27 '22 16:09 terrytangyuan

@terrytangyuan Are you saying this is the cause for the regression I am seeing?

The workflows do show up when a transition occurs. Also completed workflows are shown correctly. Just running ones are not.

Also I installed argo from the install.yaml as per the release, so the roles and cluster roles are the vanilla roles from v3.4.0 – this should surely not miss any permissions?

scravy avatar Sep 27 '22 16:09 scravy

time="2022-09-27T16:00:23.646Z" level=info msg="archiving workflow" namespace=argo uid=cdd6c373-96d9-4c98-b894-3ef1f02b18a2 workflow=zargo-debug-child-vcmgq

This indicates that you are archiving the workflows. Could you check if they are in your list of archived workflows?

terrytangyuan avatar Sep 27 '22 16:09 terrytangyuan

First things first. So I did patch the respective role with the missing permissions. I changed the argo-cluster-role as per the vanilla install yaml at https://github.com/argoproj/argo-workflows/releases/download/v3.4.0/install.yaml from

- apiGroups:
  - argoproj.io
  resources:
  - workflowtaskresults
  verbs:
  - list
  - watch
  - deletecollection

to

- apiGroups:
  - argoproj.io
  resources:
  - workflowtaskresults
  verbs:
  - list
  - watch
  - deletecollection
  - create
  - patch

The warnings are now gone. Here is the complete logs from a rerun. However the problem that I am describing exists in the same way.

Wait Container logs:

time="2022-09-27T16:50:02.568Z" level=info msg="Starting Workflow Executor" version=v3.4.0
time="2022-09-27T16:50:02.570Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-09-27T16:50:02.570Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo podName=zargo-debug-child-qnxdn-work-1688663510 template="{\"name\":\"work\",\"inputs\":{},\"outputs\":{},\"metadata\":{\"annotations\":{\"cluster-autoscaler.kubernetes.io/safe-to-evict\":\"false\"}},\"script\":{\"name\":\"\",\"image\":\"bash:5.2.0-alpine3.15\",\"command\":[\"bash\"],\"resources\":{},\"source\":\"echo \\\"Doing some work for 60 seconds\\\"\\nsleep 60\\n\"},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"s3.amazonaws.com\",\"bucket\":\"incrmntal-argo-artifactory\",\"region\":\"us-east-1\",\"insecure\":false,\"accessKeySecret\":{\"name\":\"argo-s3-config\",\"key\":\"accessKey\"},\"secretKeySecret\":{\"name\":\"argo-s3-config\",\"key\":\"secretKey\"},\"encryptionOptions\":{},\"key\":\"incrmntal-prod-two/argo/zargo-debug-child-qnxdn/2022-09-27-zargo-debug-child-qnxdn-work-1688663510\"}},\"serviceAccountName\":\"argo\"}" version="&Version{Version:v3.4.0,BuildDate:2022-09-19T03:47:58Z,GitCommit:047952afd539d06cae2fd6ba0b608b19c1194bba,GitTag:v3.4.0,GitTreeState:clean,GoVersion:go1.18.6,Compiler:gc,Platform:linux/amd64,}"
time="2022-09-27T16:50:02.570Z" level=info msg="Starting deadline monitor"
time="2022-09-27T16:51:03.582Z" level=info msg="Main container completed" error="<nil>"
time="2022-09-27T16:51:03.582Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-09-27T16:51:03.582Z" level=info msg="No output parameters"
time="2022-09-27T16:51:03.582Z" level=info msg="No output artifacts"
time="2022-09-27T16:51:03.582Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: incrmntal-prod-two/argo/zargo-debug-child-qnxdn/2022-09-27-zargo-debug-child-qnxdn-work-1688663510/main.log"
time="2022-09-27T16:51:03.582Z" level=info msg="Creating minio client using static credentials" endpoint=s3.amazonaws.com
time="2022-09-27T16:51:03.590Z" level=info msg="Saving file to s3" bucket=incrmntal-argo-artifactory endpoint=s3.amazonaws.com key=incrmntal-prod-two/argo/zargo-debug-child-qnxdn/2022-09-27-zargo-debug-child-qnxdn-work-1688663510/main.log path=/tmp/argo/outputs/logs/main.log
time="2022-09-27T16:51:03.637Z" level=info msg="Save artifact" artifactName=main-logs duration=54.307792ms error="<nil>" key=incrmntal-prod-two/argo/zargo-debug-child-qnxdn/2022-09-27-zargo-debug-child-qnxdn-work-1688663510/main.log
time="2022-09-27T16:51:03.637Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-09-27T16:51:03.637Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-09-27T16:51:03.661Z" level=info msg="Create workflowtaskresults 201"
time="2022-09-27T16:51:03.661Z" level=info msg="Deadline monitor stopped"
time="2022-09-27T16:51:03.661Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2022-09-27T16:51:03.661Z" level=info msg="Alloc=7571 TotalAlloc=15996 Sys=23506 NumGC=5 Goroutines=9"

Workflow Controller Log

time="2022-09-27T16:50:00.763Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.776Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.776Z" level=info msg="Steps node zargo-debug-child-qnxdn initialized Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.776Z" level=info msg="StepGroup node zargo-debug-child-qnxdn-2181329391 initialized Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.777Z" level=info msg="Pod node zargo-debug-child-qnxdn-1688663510 initialized Pending" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.809Z" level=info msg="Created pod: zargo-debug-child-qnxdn[0].node1 (zargo-debug-child-qnxdn-work-1688663510)" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.809Z" level=info msg="Workflow step group node zargo-debug-child-qnxdn-2181329391 not yet completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.809Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.809Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:00.820Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10468680 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg="node changed" namespace=argo new.message= new.phase=Running new.progress=0/1 nodeID=zargo-debug-child-qnxdn-1688663510 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg="Workflow step group node zargo-debug-child-qnxdn-2181329391 not yet completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.811Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:50:10.824Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10468732 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.959Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.959Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.959Z" level=info msg="task-result changed" namespace=argo nodeID=zargo-debug-child-qnxdn-1688663510 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-qnxdn-1688663510 old.message= old.phase=Running old.progress=0/1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="Step group node zargo-debug-child-qnxdn-2181329391 successful" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="node zargo-debug-child-qnxdn-2181329391 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="node zargo-debug-child-qnxdn-2181329391 finished: 2022-09-27 16:51:12.960547268 +0000 UTC" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="StepGroup node zargo-debug-child-qnxdn-2248586962 initialized Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.960Z" level=info msg="SG Outbound nodes of zargo-debug-child-qnxdn-1688663510 are [zargo-debug-child-qnxdn-1688663510]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.961Z" level=info msg="Pod node zargo-debug-child-qnxdn-3431233478 initialized Pending" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.996Z" level=info msg="Created pod: zargo-debug-child-qnxdn[1].node2 (zargo-debug-child-qnxdn-work-3431233478)" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.996Z" level=info msg="Workflow step group node zargo-debug-child-qnxdn-2248586962 not yet completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.996Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:12.996Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:13.008Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10469033 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:13.013Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-qnxdn-work-1688663510/labelPodCompleted
time="2022-09-27T16:51:22.998Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.998Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.998Z" level=info msg="node changed" namespace=argo new.message= new.phase=Running new.progress=0/1 nodeID=zargo-debug-child-qnxdn-3431233478 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.998Z" level=info msg="SG Outbound nodes of zargo-debug-child-qnxdn-1688663510 are [zargo-debug-child-qnxdn-1688663510]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.998Z" level=info msg="Workflow step group node zargo-debug-child-qnxdn-2248586962 not yet completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.998Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:22.999Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:51:23.011Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10469088 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=2 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="task-result changed" namespace=argo nodeID=zargo-debug-child-qnxdn-3431233478 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-qnxdn-3431233478 old.message= old.phase=Running old.progress=0/1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="SG Outbound nodes of zargo-debug-child-qnxdn-1688663510 are [zargo-debug-child-qnxdn-1688663510]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Step group node zargo-debug-child-qnxdn-2248586962 successful" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="node zargo-debug-child-qnxdn-2248586962 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="node zargo-debug-child-qnxdn-2248586962 finished: 2022-09-27 16:52:25.256890468 +0000 UTC" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Outbound nodes of zargo-debug-child-qnxdn-3431233478 is [zargo-debug-child-qnxdn-3431233478]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Outbound nodes of zargo-debug-child-qnxdn is [zargo-debug-child-qnxdn-3431233478]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="node zargo-debug-child-qnxdn phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="node zargo-debug-child-qnxdn finished: 2022-09-27 16:52:25.256948516 +0000 UTC" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Checking daemoned children of zargo-debug-child-qnxdn" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.256Z" level=info msg="Running OnExit handler: b72ca92baf93-exit-handler" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.257Z" level=info msg="Steps node zargo-debug-child-qnxdn-4154040042 initialized Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.257Z" level=info msg="StepGroup node zargo-debug-child-qnxdn-3486861020 initialized Running" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.257Z" level=info msg="Pod node zargo-debug-child-qnxdn-1717903073 initialized Pending" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.285Z" level=info msg="Created pod: zargo-debug-child-qnxdn.onExit[0].success (zargo-debug-child-qnxdn-b72ca92baf93-send-slack-1717903073)" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.285Z" level=info msg="Skipping zargo-debug-child-qnxdn.onExit[0].failure: when 'Succeeded != Succeeded' evaluated false" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.285Z" level=info msg="Skipped node zargo-debug-child-qnxdn-3455007452 initialized Skipped (message: when 'Succeeded != Succeeded' evaluated false)" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.285Z" level=info msg="Workflow step group node zargo-debug-child-qnxdn-3486861020 not yet completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.298Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=10469324 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:25.304Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-qnxdn-work-3431233478/labelPodCompleted
time="2022-09-27T16:52:35.287Z" level=info msg="Processing workflow" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=3 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="task-result changed" namespace=argo nodeID=zargo-debug-child-qnxdn-1717903073 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=zargo-debug-child-qnxdn-1717903073 old.message= old.phase=Pending old.progress=0/1 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg=reconcileAgentPod namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Running OnExit handler: b72ca92baf93-exit-handler" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Step group node zargo-debug-child-qnxdn-3486861020 successful" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="node zargo-debug-child-qnxdn-3486861020 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="node zargo-debug-child-qnxdn-3486861020 finished: 2022-09-27 16:52:35.288917456 +0000 UTC" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Outbound nodes of zargo-debug-child-qnxdn-1717903073 is [zargo-debug-child-qnxdn-1717903073]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Outbound nodes of zargo-debug-child-qnxdn-3455007452 is [zargo-debug-child-qnxdn-3455007452]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Outbound nodes of zargo-debug-child-qnxdn-4154040042 is [zargo-debug-child-qnxdn-1717903073 zargo-debug-child-qnxdn-3455007452]" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="node zargo-debug-child-qnxdn-4154040042 phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="node zargo-debug-child-qnxdn-4154040042 finished: 2022-09-27 16:52:35.288986789 +0000 UTC" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.288Z" level=info msg="Checking daemoned children of zargo-debug-child-qnxdn-4154040042" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.289Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.289Z" level=info msg="Marking workflow completed" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.289Z" level=info msg="Marking workflow as pending archiving" namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.289Z" level=info msg="Checking daemoned children of " namespace=argo workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.294Z" level=info msg="cleaning up pod" action=deletePod key=argo/zargo-debug-child-qnxdn-1340600742-agent/deletePod
time="2022-09-27T16:52:35.301Z" level=info msg="Workflow update successful" namespace=argo phase=Succeeded resourceVersion=10469386 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.515Z" level=info msg="archiving workflow" namespace=argo uid=6dceddd6-f757-4398-85e1-34333b8aaa13 workflow=zargo-debug-child-qnxdn
time="2022-09-27T16:52:35.520Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/zargo-debug-child-qnxdn-b72ca92baf93-send-slack-1717903073/labelPodCompleted
time="2022-09-27T16:52:35.554Z" level=info msg="Queueing Succeeded workflow argo/zargo-debug-child-qnxdn for delete in 18h0m0s due to TTL"

scravy avatar Sep 27 '22 16:09 scravy

@terrytangyuan Yes we are archiving workflows. But the problem here is that the workflow is not listed while it is running. Once the workflow transitions to a new node in the flow and the list of workflows in the UI is open while it does it will show in the list.

...and yes, they are in the archived workflows, but only after they completed.

scravy avatar Sep 27 '22 16:09 scravy

@scravy Can you check if a namespace filter is populated with the cookie's default value?

sarabala1979 avatar Sep 28 '22 14:09 sarabala1979

@sarabala1979 I am not sure what you mean by "the cookie's default value".

In the UI I do have a filter set for namespace = argo – but then again we do everything in that namespace and all workflows show up here, and it used to work in previous versions or argo, so not sure how this would affect anything.

I looked at the cookies and there's four cookies set: authorization, _ga_E659EW9J25, _gcl_au, and _ga.

scravy avatar Sep 28 '22 20:09 scravy

It also happens to me. When you first load the workflow page in the UI it does not show all the workflows available but if you wait like 20 or 30 seconds they suddenly appear (without touching anything in the UI during the wait). Access directly to the URL of a specific workflow works fine always, this only happen in the list view. I'm not sure about the root cause but it seems more related to the UI than the server.

ese avatar Oct 03 '22 12:10 ese

When you refresh the list view, do they appear as expected?

terrytangyuan avatar Oct 03 '22 15:10 terrytangyuan

@scravy Is it working as expected on v3.3?

sarabala1979 avatar Oct 03 '22 17:10 sarabala1979

When I refresh the page it always show the list incomplete and after some time it loads all the workflows.

I noticed the issue is calling this endpoint:

https://argo/api/v1/workflow-events/argo-workflows?listOptions.fieldSelector=metadata.namespace=argo-workflows&fields=result.object.metadata.name,result.object.metadata.namespace,result.object.metadata.resourceVersion,result.object.metadata.creationTimestamp,result.object.metadata.uid,result.object.status.finishedAt,result.object.status.phase,result.object.status.message,result.object.status.startedAt,result.object.status.estimatedDuration,result.object.status.progress,result.type,result.object.metadata.labels,result.object.spec.suspend

As soon as this error appears in the browser console the UI loads all the workflows.

image

ese avatar Oct 03 '22 21:10 ese

@terrytangyuan No, they do not appear when the list view is refreshed. On the contrary: When they appear and you then refresh they are gone again. Until the workflow transitions to a new node and pops up again.

@sarabala1979 We actually made a version jump from 3.1 where it used to work to 3.4 where it doesn't. I can try setting up a 3.3 later today and see whether the problem exists or not.

@ese On my installation I do not actually see any errors in the UI / dev console. As I mentioned above once a workflow event comes in the workflow is shown, but not all workflows. Are you sure all missing workflows are being shown once you hit that error?

scravy avatar Oct 04 '22 05:10 scravy

Seeing the same issue. After I enqueue some workflows (and while they stay in the "Pending" state), they are not initially displayed when I open the list view. After some time they appear (as "Pending"). If I refresh the page, they are gone again. kubectl get workflow and argo list correctly show those disappearing workflows as Pending.

We upgraded from 3.3.8 to 3.4.1 recently.

tonyo avatar Oct 05 '22 14:10 tonyo

I setup a 3.3 on our dev cluster and things work as expected. Both 3.4.0 and 3.4.1 show the regression as described here.

scravy avatar Oct 12 '22 09:10 scravy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 29 '22 05:10 stale[bot]

Bumping this issue. Also: https://drewdevault.com/2021/10/26/stalebot.html I do agree. This is an open source project. I am investing time in reporting and helping analyze this issue. And then I am told "you didn't fix this yourself, bugger off, we mark this as stale and will close/dismiss it eventually".

scravy avatar Nov 02 '22 10:11 scravy