conductor
conductor copied to clipboard
How to improve the throughput of the conductor?
Describe the bug A clear and concise description of what the bug is.
Details Conductor version: 3.11.0 Persistence implementation: redis Queue implementation: redis Lock: Redis Workflow definition: Task definition: Event handler definition:
To Reproduce Steps to reproduce the behavior: Conditions: Conductor cluster: 3 instances. Each instance has 8 cores and 16G. Worker service: 1 cluster (2 instances, each with 8 cores and 8G)
I found in the pressure test that when the qps of a single workflow is 10, it performs well and takes about 1s to run. However, at 20, it took more than 1 minute; When the qps is 30, it takes more than 2 minutes. Please help me find any predefined solution/design. Thanks in advance!
I found in the pressure test that the delay between the completion of one task and the start of the next task is particularly large. How can I optimize it?
Expected behavior How to reduce latency under high throughput (such as starting workflow: 100/s).
Screenshots
The following is my configuration: `spring.application.name=conductor springdoc.api-docs.path=/api-docs
conductor.db.type=redis_standalone conductor.redis.hosts=10.00.00.00:3380:us-east-1c:666666 conductor.redis.dataCenterRegion=us-east-1 conductor.redis.availabilityZone=us-east-1c conductor.redis.maxConnectionsPerHost=1000 conductor.redis.taskDefCacheRefreshInterval = 1 #conductor.redis.eventExecutionPersistenceTTL conductor.redis.workflowNamespacePrefix=conductor-stable conductor.redis.queueNamespacePrefix=conductor-queue-stable conductor.redis.queuesNonQuorumPort=22122 queues.dynomite.threads=20
conductor.indexing.enabled=true
conductor.elasticsearch.url=10.00.00.00:9260 conductor.elasticsearch.indexName=conductor-stable conductor.elasticsearch.version=7 conductor.elasticsearch.asyncWorkerQueueSize=100 conductor.elasticsearch.asyncMaxPoolSize=12 conductor.elasticsearch.asyncBufferFlushTimeout=10 conductor.elasticsearch.username=es conductor.elasticsearch.password=tsp1024 conductor.default-event-queue.type=conductor
conductor.app.workflowExecutionLockEnabled=true conductor.workflow-execution-lock.type=redis conductor.app.lockLeaseTime=60000 conductor.app.lockTimeToTry=50 conductor.redis-lock.serverType=single conductor.redis-lock.serverAddress=redis://10.00.00.00:3380 conductor.redis-lock.serverPassword=abcabcabc88999 conductor.redis-lock.namespace=conductor_stable logging.config=classpath:log4j2.xml
conductor.workflow-monitor.enabled=true #Disable default conductor.workflow-reconciler.enabled=false conductor.workflow-repair-service.enabled=false
conductor.app.systemTaskWorkerPollInterval=1 conductor.app.systemTaskMaxPollCount=10 conductor.app.systemTaskWorkerThreadCount=10
conductor.app.maxTaskOutputPayloadSizeThreshold=102400 conductor.app.maxTaskInputPayloadSizeThreshold=102400 conductor.app.taskOutputPayloadSizeThreshold=102400 conductor.app.taskInputPayloadSizeThreshold=
conductor.app.sweeperThreadCount=10 conductor.sweep-frequency.millis=1`
I found in the pressure test that when the qps of a single workflow is 10, it performs well and takes about 1s to run. However, at 20, it took more than 1 minute; When the qps is 30, it takes more than 2 minutes.
I presume that you are referring to the rcc_getDeviceNetworkStatus
task. If its a SIMPLE, please check if you are increasing the worker threads (or instances) as you increase the number of workflows. If its a system task, consider tweaking the values for
conductor.app.systemTaskMaxPollCount
conductor.app.systemTaskWorkerThreadCount
This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.
This issue was closed, because it has been stalled for 7 days with no activity.