argo-workflows
argo-workflows copied to clipboard
Large Workflow hydration failed: upper: no more rows in this result set
Version
Argo Version: v2.12.0
Problem
I am running a large workflow with more than 130 nodes in Argo Workflow. At some point, workflow controller will report "hydration failed: upper: no more rows in this result set" and update workflow from Running to Error
Log
time="2022-06-28T09:46:54.965Z" level=info msg="Alloc=68215 TotalAlloc=68006896 Sys=290846 NumGC=2238 Goroutines=344" time="2022-06-28T09:46:58.511Z" level=info msg="Performing periodic workflow GC" time="2022-06-28T09:46:58.514Z" level=info msg="Zero old offloads, nothing to do" time="2022-06-28T09:47:47.823Z" level=error msg="hydration failed: upper: no more rows in this result set" namespace=ns-test transientErr=false workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Updated phase Running -> Error" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Updated message -> upper: no more rows in this result set" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Marking workflow completed" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:48.038Z" level=info msg="Workflow update successful" namespace=ns-test phase=Error resourceVersion=382738852 workflow=largeworkflow
Workflow Status
The workflow status is below:
"status": { "conditions": [ { "status": "True", "type": "Completed" } ], "finishedAt": "2022-06-28T09:47:47Z", "message": "upper: no more rows in this result set", "offloadNodeStatusVersion": "fnv:2642547617", "phase": "Error", "progress": "0/0", "startedAt": "2022-06-28T09:12:31Z" }
Could it caused by GC delete offloadWorkflow in database?
we won't support v2.12 . Can you upgrade to v3.2 and try it? you right. offloadworkflow is deleted
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.
@sarabala1979 we're seeing the same error as reported here, running v3.3.10.
We have a fairly large workflow (~5,000 nodes) and so I set up persistence and set nodeStatusOffLoad: true
per the docs' recommendation.
(Most recent logs first)
Date,Service,Message
"2023-02-05T03:54:40.581Z","workflow-controller","Marking workflow as pending archiving"
"2023-02-05T03:54:40.581Z","workflow-controller","Marking workflow completed"
"2023-02-05T03:54:40.581Z","workflow-controller","Updated message -> upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Updated phase Running -> Error"
"2023-02-05T03:54:40.581Z","workflow-controller","hydration failed: upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Non-transient error: upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch cronworkflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List cronworkflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting prometheus metrics server at localhost:9090/metrics"
"2023-02-05T03:54:40.581Z","workflow-controller","W0205 03:54:40.453436 1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed"
"2023-02-05T03:54:40.581Z","workflow-controller","Started workflow garbage collection"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting CronWorkflow controller"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting workflow garbage collector controller (retentionWorkers 4)"
"2023-02-05T03:54:40.581Z","workflow-controller","Performing archived workflow GC"
"2023-02-05T03:54:40.581Z","workflow-controller","Performing periodic GC"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtaskresults 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch pods 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtaskresults 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch clusterworkflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List clusterworkflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtasksets 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List pods 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtasksets 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Manager initialized successfully"
Confirming that the workflow-controller is trying to use persistence, and offload node status:
Date,Service,Message
"2023-02-05T03:54:30.574Z","workflow-controller","applying database change"
"2023-02-05T03:54:30.574Z","workflow-controller","Migrating database schema"
"2023-02-05T03:54:30.574Z","workflow-controller","Persistence Session created successfully"
"2023-02-05T03:54:30.574Z","workflow-controller","Get secrets 200"
"2023-02-05T03:54:30.574Z","workflow-controller","Get secrets 200"
"2023-02-05T03:54:30.574Z","workflow-controller","Creating DB session"
"2023-02-05T03:54:30.574Z","workflow-controller","Persistence configuration enabled"
# …
"2023-02-05T03:54:30.574Z","workflow-controller","Workflow archiving is enabled"
"2023-02-05T03:54:30.574Z","workflow-controller","Node status offloading is enabled"
"2023-02-05T03:54:30.574Z","workflow-controller","Node status offloading config"