argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Offloading large workflows raises db syntax errors that prevent the offloading

Open Yarin-Shitrit opened this issue 1 year ago • 4 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [ ] I can confirm the issue exists when I tested with :latest
  • [X] I have searched existing issues and could not find a match for this bug
  • [x] I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

I have tried to enable nodeStatusOffload for the workflow-controller, and configured the postgresql under the persistence section in the workflow-controller-configuration. The connection is successful, as there're migration logs in the controller. But when it reaches the offloading part, it crashes with this error:

workflow is longer than maximum allowed size. compressed size 1053022 > maxSize 1048576. Tried to offload but encountered an error: pq: syntax error at or near '(' 

I also tried doing this with mysql instead of postgresql but got another syntax error:

workflow is longer than maximum allowed size. compressed size 1053022 > maxSize 1048576. Tried to offload but encountered an error: ERROR 1064: You have an error in your SQL syntax; check the manual that corresponds to your MYSQL server version for the right syntax to use near '('clustername', 'namespace', 'uid', 'nodes', 'version') VALUES (?' at line 2.

I am trying to make my Argo Server run very large workflows, and to do so I generated a sample DAG with demo steps that simply print, and tried to run 700 of those without any dependency between the tasks.

Version

v3.4.6

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

I simply ran a workflow that make some prints, not anything complex.
And loaded my DAG with 700 steps like this.

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Yarin-Shitrit avatar Apr 16 '24 13:04 Yarin-Shitrit

  • [x] I can confirm the issue exists when I tested with :latest

v3.4.6

You're on a pretty old version, so I would suggest trying with :latest or 3.4.16 at least. Please fill out the issue template accurately, it asks for :latest usage very intentionally.

agilgur5 avatar Apr 16 '24 22:04 agilgur5

Looking at the diff v3.4.6..v3.4.16, #10887 in particular seems related (although the error is different)

agilgur5 avatar Apr 19 '24 21:04 agilgur5

I’ll be able to test the “:latest” by Sunday but what other information can I provide you with?

Yarin-Shitrit avatar Apr 20 '24 08:04 Yarin-Shitrit

This issue has been automatically marked as stale because it has not had recent activity and needs more information. It will be closed if no further activity occurs.

github-actions[bot] avatar May 05 '24 02:05 github-actions[bot]

This issue has been closed due to inactivity and lack of information. If you still encounter this issue, please add the requested information and re-open.

github-actions[bot] avatar May 19 '24 02:05 github-actions[bot]