conductor
conductor copied to clipboard
One hour after running the workflow, the conductor cpu is almost exhausted and the performance drops sharply.
Describe the bug A clear and concise description of what the bug is.
Details
Conductor version: 3.11.0
Persistence implementation: Redis
Queue implementation: Redis
Lock: Redis
Workflow definition:
{ "createTime": 1663828820563, "updateTime": 1663936724464, "name": "rcc_push_car_cruising_range_mileage_workflow", "description": "xxx", "version": 3, "tasks": [ { "name": "preCheckInstruction", "taskReferenceName": "preCheck", "inputParameters": { "carId": "${workflow.input.carId}", "vinId": "${workflow.input.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "instructionInvoker": "${workflow.input.instructionInvoker}", "sync": "${workflow.input.sync}", "timeout": "${workflow.input.timeout}", "timestamp": "${workflow.input.timestamp}", "fashionId": "${workflow.input.fashionId}", "operator": "${workflow.input.operator}", "orderId": "${workflow.input.orderId}", "requestId": "${workflow.input.requestId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "decide_task", "taskReferenceName": "preCheck_decide", "inputParameters": { "switchCaseValue": "${preCheck.output.canExecuteFlag}" }, "type": "SWITCH", "decisionCases": { "can_execute": [ { "name": "rcc_getDeviceNetworkStatus", "taskReferenceName": "getDeviceNetworkStatus", "inputParameters": { "requestId": "${preCheck.output.requestId}", "carId": "${preCheck.output.carId}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false }, { "name": "network_result_decision", "taskReferenceName": "network_result_decision", "inputParameters": { "case_value_param": "${getDeviceNetworkStatus.output.networkStatus}" }, "type": "SWITCH", "decisionCases": { "online": [ { "name": "rcc_sendActionInstruction", "taskReferenceName": "sendActionInstruction", "inputParameters": { "carId": "${preCheck.output.carId}", "vinId": "${preCheck.output.vinId}", "instructionCode": "${workflow.input.instructionCode}", "instructionValue": "${workflow.input.instructionValue}", "requestId": "${preCheck.output.requestId}", "requestType": "${workflow.input.requestType}" }, "type": "SIMPLE", "startDelay": 0, "optional": false, "asyncComplete": false } ] }, "defaultCase": [ { "name": "vehicle_network_not_expected", "taskReferenceName": "vehicle_network_not_expected", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "vehicle network not expected", "workflowOutput": { "code": 100602, "message": "vehicle network not expected" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "case_value_param" } ] }, "defaultCase": [ { "name": "dont_execute", "taskReferenceName": "dont_execute", "inputParameters": { "terminationStatus": "FAILED", "terminationReason": "dont execute", "workflowOutput": { "code": 100603, "message": "dont execute" } }, "type": "TERMINATE", "startDelay": 0, "optional": false, "asyncComplete": false } ], "startDelay": 0, "optional": false, "asyncComplete": false, "evaluatorType": "value-param", "expression": "switchCaseValue" } ], "inputParameters": [], "outputParameters": { "body": "${getResultByRequestId.output.response.body}" }, "schemaVersion": 2, "restartable": true, "workflowStatusListenerEnabled": false, "ownerEmail": "[email protected]", "timeoutPolicy": "ALERT_ONLY", "timeoutSeconds": 0, "variables": {}, "inputTemplate": {} }
Task definition:
Event handler definition:
To Reproduce Steps to reproduce the behavior: 1.Deployment architecture: conductor does not use cluster, but only deploys one instance, a 4-core 8G memory dock container. 2.Start a workflow every 30 seconds. In the first 78 minutes, it takes 300~500ms. The performance is stable. 3.But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.
Expected behavior The performance remains stable, and the time consumption is maintained at 300~500ms.
Screenshots
Additional context Add any other context about the problem here.
hello @lianjunwei , can you elaborate this?
But after running a workflow for 78 minutes , the conductor CPU is almost exhausted and the performance drops sharply.
Is the CPU usage increasing after 78 minutes of starting a workflow every 30 seconds?
@aravindanr yes. After the cluster mode (3 nodes) is deployed, this problem does not occur. Only when the qps is high (qps=50), the CPU utilization rate is high.
This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.
This issue was closed, because it has been stalled for 7 days with no activity.