conductor One hour after running the workflow, the conductor cpu is almost exhausted and the performance drops sharply.

Describe the bug A clear and concise description of what the bug is.

Details Conductor version: 3.11.0 Persistence implementation: Redis Queue implementation: Redis Lock: Redis Workflow definition: { "createTime": 1663828820563, "updateTime": 1663936724464, "name": "rcc_push_car_cruising_range_mileage_workflow", "description": "xxx", "version": 3, "tasks": [ { "name": "preCheckInstruction", "taskReferenceName": "preCheck", "inputParameters": { "carId": "${workflow.input.carId}", "vinId": "${workflow.input.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "instructionInvoker": "${workflow.input.instructionInvoker}", "sync": "${workflow.input.sync}", "timeout": "${workflow.input.timeout}", "timestamp": "${workflow.input.timestamp}", "fashionId": "${workflow.input.fashionId}", "operator": "${workflow.input.operator}", "orderId": "${workflow.input.orderId}", "requestId": "${workflow.input.requestId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "decide_task", "taskReferenceName": "preCheck_decide", "inputParameters": { "switchCaseValue": "${preCheck.output.canExecuteFlag}" }, "type": "SWITCH", "decisionCases": { "can_execute": [ { "name": "rcc_getDeviceNetworkStatus", "taskReferenceName": "getDeviceNetworkStatus", "inputParameters": { "requestId": "${preCheck.output.requestId}", "carId": "${preCheck.output.carId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "network_result_decision", "taskReferenceName": "network_result_decision", "inputParameters": { "case_value_param": "${getDeviceNetworkStatus.output.networkStatus}" }, "type": "SWITCH", "decisionCases": { "online": [ { "name": "rcc_sendActionInstruction", "taskReferenceName": "sendActionInstruction", "inputParameters": { "carId": "${preCheck.output.carId}", "vinId": "${preCheck.output.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "requestId": "${preCheck.output.requestId}", "requestType": "${workflow.input.requestType}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false } ] }, "defaultCase": [ { "name": "vehicle_network_not_expected", "taskReferenceName": "vehicle_network_not_expected", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "vehicle network not expected", "workflowOutput": { "code": 100602, "message": "vehicle network not expected" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "case_value_param" } ] }, "defaultCase": [ { "name": "dont_execute", "taskReferenceName": "dont_execute", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "dont execute", "workflowOutput": { "code": 100603, "message": "dont execute" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "switchCaseValue" } ], "inputParameters": [], "outputParameters": { "body": "${getResultByRequestId.output.response.body}" }, "schemaVersion": 2, "restartable": true, "workflowStatusListenerEnabled": false, "ownerEmail": "[email protected]", "timeoutPolicy": "ALERT_ONLY", "timeoutSeconds": 0, "variables": {}, "inputTemplate": {} } Task definition: Event handler definition:

To Reproduce Steps to reproduce the behavior: 1.Deployment architecture: conductor does not use cluster, but only deploys one instance, a 4-core 8G memory dock container. 2.Start a workflow every 30 seconds. In the first 78 minutes, it takes 300~500ms. The performance is stable. 3.But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.

Expected behavior The performance remains stable, and the time consumption is maintained at 300~500ms.

Screenshots costlong

Additional context Add any other context about the problem here.

Sep 25 '22 16:09 lianjunwei

hello @lianjunwei , can you elaborate this?

But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.

Is the CPU usage increasing after 78 minutes of starting a workflow every 30 seconds?

Sep 27 '22 18:09 aravindanr

@aravindanr yes. After the cluster mode (3 nodes) is deployed, this problem does not occur. Only when the qps is high (qps=50), the CPU utilization rate is high.

Sep 30 '22 07:09 lianjunwei

This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.

Nov 25 '22 00:11 github-actions[bot]

This issue was closed, because it has been stalled for 7 days with no activity.

Dec 03 '22 00:12 github-actions[bot]

conductor conductor copied to clipboard

One hour after running the workflow, the conductor cpu is almost exhausted and the performance drops sharply.

conductor
conductor copied to clipboard