argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Large Workflow hydration failed: upper: no more rows in this result set

Open liujiong1982 opened this issue 2 years ago • 3 comments

Version

Argo Version: v2.12.0

Problem

I am running a large workflow with more than 130 nodes in Argo Workflow. At some point, workflow controller will report "hydration failed: upper: no more rows in this result set" and update workflow from Running to Error

Log

time="2022-06-28T09:46:54.965Z" level=info msg="Alloc=68215 TotalAlloc=68006896 Sys=290846 NumGC=2238 Goroutines=344" time="2022-06-28T09:46:58.511Z" level=info msg="Performing periodic workflow GC" time="2022-06-28T09:46:58.514Z" level=info msg="Zero old offloads, nothing to do" time="2022-06-28T09:47:47.823Z" level=error msg="hydration failed: upper: no more rows in this result set" namespace=ns-test transientErr=false workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Updated phase Running -> Error" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Updated message -> upper: no more rows in this result set" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:47.823Z" level=info msg="Marking workflow completed" namespace=ns-test workflow=largeworkflow time="2022-06-28T09:47:48.038Z" level=info msg="Workflow update successful" namespace=ns-test phase=Error resourceVersion=382738852 workflow=largeworkflow

Workflow Status

The workflow status is below:

"status": { "conditions": [ { "status": "True", "type": "Completed" } ], "finishedAt": "2022-06-28T09:47:47Z", "message": "upper: no more rows in this result set", "offloadNodeStatusVersion": "fnv:2642547617", "phase": "Error", "progress": "0/0", "startedAt": "2022-06-28T09:12:31Z" }

liujiong1982 avatar Jun 28 '22 13:06 liujiong1982

Could it caused by GC delete offloadWorkflow in database?

liujiong1982 avatar Jun 28 '22 13:06 liujiong1982

we won't support v2.12 . Can you upgrade to v3.2 and try it? you right. offloadworkflow is deleted

sarabala1979 avatar Jun 28 '22 16:06 sarabala1979

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Jul 31 '22 04:07 stale[bot]

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

stale[bot] avatar Aug 12 '22 12:08 stale[bot]

@sarabala1979 we're seeing the same error as reported here, running v3.3.10.

We have a fairly large workflow (~5,000 nodes) and so I set up persistence and set nodeStatusOffLoad: true per the docs' recommendation.

(Most recent logs first)

Date,Service,Message
"2023-02-05T03:54:40.581Z","workflow-controller","Marking workflow as pending archiving"
"2023-02-05T03:54:40.581Z","workflow-controller","Marking workflow completed"
"2023-02-05T03:54:40.581Z","workflow-controller","Updated message -> upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Updated phase Running -> Error"
"2023-02-05T03:54:40.581Z","workflow-controller","hydration failed: upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Non-transient error: upper: no more rows in this result set"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch cronworkflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List cronworkflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting prometheus metrics server at localhost:9090/metrics"
"2023-02-05T03:54:40.581Z","workflow-controller","W0205 03:54:40.453436 1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed"
"2023-02-05T03:54:40.581Z","workflow-controller","Started workflow garbage collection"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting CronWorkflow controller"
"2023-02-05T03:54:40.581Z","workflow-controller","Starting workflow garbage collector controller (retentionWorkers 4)"
"2023-02-05T03:54:40.581Z","workflow-controller","Performing archived workflow GC"
"2023-02-05T03:54:40.581Z","workflow-controller","Performing periodic GC"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflows 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtaskresults 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch pods 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtaskresults 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch clusterworkflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List clusterworkflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtasksets 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch workflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List pods 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtemplates 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","List workflowtasksets 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Create selfsubjectaccessreviews 201"
"2023-02-05T03:54:40.581Z","workflow-controller","Watch configmaps 200"
"2023-02-05T03:54:40.581Z","workflow-controller","Manager initialized successfully"

Confirming that the workflow-controller is trying to use persistence, and offload node status:

Date,Service,Message
"2023-02-05T03:54:30.574Z","workflow-controller","applying database change"
"2023-02-05T03:54:30.574Z","workflow-controller","Migrating database schema"
"2023-02-05T03:54:30.574Z","workflow-controller","Persistence Session created successfully"
"2023-02-05T03:54:30.574Z","workflow-controller","Get secrets 200"
"2023-02-05T03:54:30.574Z","workflow-controller","Get secrets 200"
"2023-02-05T03:54:30.574Z","workflow-controller","Creating DB session"
"2023-02-05T03:54:30.574Z","workflow-controller","Persistence configuration enabled"
# …
"2023-02-05T03:54:30.574Z","workflow-controller","Workflow archiving is enabled"
"2023-02-05T03:54:30.574Z","workflow-controller","Node status offloading is enabled"
"2023-02-05T03:54:30.574Z","workflow-controller","Node status offloading config"

goodgravy avatar Feb 08 '23 11:02 goodgravy