conductor icon indicating copy to clipboard operation
conductor copied to clipboard

One hour after running the workflow, the conductor cpu is almost exhausted and the performance drops sharply.

Open lianjunwei opened this issue 2 years ago • 2 comments

Describe the bug A clear and concise description of what the bug is.

Details Conductor version: 3.11.0 Persistence implementation: Redis Queue implementation: Redis Lock: Redis Workflow definition: { "createTime": 1663828820563, "updateTime": 1663936724464, "name": "rcc_push_car_cruising_range_mileage_workflow", "description": "xxx", "version": 3, "tasks": [ { "name": "preCheckInstruction", "taskReferenceName": "preCheck", "inputParameters": { "carId": "${workflow.input.carId}", "vinId": "${workflow.input.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "instructionInvoker": "${workflow.input.instructionInvoker}", "sync": "${workflow.input.sync}", "timeout": "${workflow.input.timeout}", "timestamp": "${workflow.input.timestamp}", "fashionId": "${workflow.input.fashionId}", "operator": "${workflow.input.operator}", "orderId": "${workflow.input.orderId}", "requestId": "${workflow.input.requestId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "decide_task", "taskReferenceName": "preCheck_decide", "inputParameters": { "switchCaseValue": "${preCheck.output.canExecuteFlag}" }, "type": "SWITCH", "decisionCases": { "can_execute": [ { "name": "rcc_getDeviceNetworkStatus", "taskReferenceName": "getDeviceNetworkStatus", "inputParameters": { "requestId": "${preCheck.output.requestId}", "carId": "${preCheck.output.carId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "network_result_decision", "taskReferenceName": "network_result_decision", "inputParameters": { "case_value_param": "${getDeviceNetworkStatus.output.networkStatus}" }, "type": "SWITCH", "decisionCases": { "online": [ { "name": "rcc_sendActionInstruction", "taskReferenceName": "sendActionInstruction", "inputParameters": { "carId": "${preCheck.output.carId}", "vinId": "${preCheck.output.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "requestId": "${preCheck.output.requestId}", "requestType": "${workflow.input.requestType}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false } ] }, "defaultCase": [ { "name": "vehicle_network_not_expected", "taskReferenceName": "vehicle_network_not_expected", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "vehicle network not expected", "workflowOutput": { "code": 100602, "message": "vehicle network not expected" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "case_value_param" } ] }, "defaultCase": [ { "name": "dont_execute", "taskReferenceName": "dont_execute", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "dont execute", "workflowOutput": { "code": 100603, "message": "dont execute" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "switchCaseValue" } ], "inputParameters": [], "outputParameters": { "body": "${getResultByRequestId.output.response.body}" }, "schemaVersion": 2, "restartable": true, "workflowStatusListenerEnabled": false, "ownerEmail": "[email protected]", "timeoutPolicy": "ALERT_ONLY", "timeoutSeconds": 0, "variables": {}, "inputTemplate": {} } Task definition: Event handler definition:

To Reproduce Steps to reproduce the behavior: 1.Deployment architecture: conductor does not use cluster, but only deploys one instance, a 4-core 8G memory dock container. 2.Start a workflow every 30 seconds. In the first 78 minutes, it takes 300~500ms. The performance is stable. 3.But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.

Expected behavior The performance remains stable, and the time consumption is maintained at 300~500ms.

Screenshots costlong

Additional context Add any other context about the problem here.

lianjunwei avatar Sep 25 '22 16:09 lianjunwei

hello @lianjunwei , can you elaborate this?

But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.

Is the CPU usage increasing after 78 minutes of starting a workflow every 30 seconds?

aravindanr avatar Sep 27 '22 18:09 aravindanr

@aravindanr yes. After the cluster mode (3 nodes) is deployed, this problem does not occur. Only when the qps is high (qps=50), the CPU utilization rate is high.

lianjunwei avatar Sep 30 '22 07:09 lianjunwei

This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Nov 25 '22 00:11 github-actions[bot]

This issue was closed, because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Dec 03 '22 00:12 github-actions[bot]