conductor icon indicating copy to clipboard operation
conductor copied to clipboard

How to improve the throughput of the conductor?

Open lianjunwei opened this issue 1 year ago • 3 comments

Describe the bug A clear and concise description of what the bug is.

Details Conductor version: 3.11.0 Persistence implementation: redis Queue implementation: redis Lock: Redis Workflow definition: Task definition: Event handler definition:

To Reproduce Steps to reproduce the behavior: Conditions: Conductor cluster: 3 instances. Each instance has 8 cores and 16G. Worker service: 1 cluster (2 instances, each with 8 cores and 8G)

I found in the pressure test that when the qps of a single workflow is 10, it performs well and takes about 1s to run. However, at 20, it took more than 1 minute; When the qps is 30, it takes more than 2 minutes. Please help me find any predefined solution/design. Thanks in advance!

I found in the pressure test that the delay between the completion of one task and the start of the next task is particularly large. How can I optimize it?

Expected behavior How to reduce latency under high throughput (such as starting workflow: 100/s).

Screenshots image

lianjunwei avatar Oct 12 '22 10:10 lianjunwei

The following is my configuration: `spring.application.name=conductor springdoc.api-docs.path=/api-docs

conductor.db.type=redis_standalone conductor.redis.hosts=10.00.00.00:3380:us-east-1c:666666 conductor.redis.dataCenterRegion=us-east-1 conductor.redis.availabilityZone=us-east-1c conductor.redis.maxConnectionsPerHost=1000 conductor.redis.taskDefCacheRefreshInterval = 1 #conductor.redis.eventExecutionPersistenceTTL conductor.redis.workflowNamespacePrefix=conductor-stable conductor.redis.queueNamespacePrefix=conductor-queue-stable conductor.redis.queuesNonQuorumPort=22122 queues.dynomite.threads=20

conductor.indexing.enabled=true

conductor.elasticsearch.url=10.00.00.00:9260 conductor.elasticsearch.indexName=conductor-stable conductor.elasticsearch.version=7 conductor.elasticsearch.asyncWorkerQueueSize=100 conductor.elasticsearch.asyncMaxPoolSize=12 conductor.elasticsearch.asyncBufferFlushTimeout=10 conductor.elasticsearch.username=es conductor.elasticsearch.password=tsp1024 conductor.default-event-queue.type=conductor

conductor.app.workflowExecutionLockEnabled=true conductor.workflow-execution-lock.type=redis conductor.app.lockLeaseTime=60000 conductor.app.lockTimeToTry=50 conductor.redis-lock.serverType=single conductor.redis-lock.serverAddress=redis://10.00.00.00:3380 conductor.redis-lock.serverPassword=abcabcabc88999 conductor.redis-lock.namespace=conductor_stable logging.config=classpath:log4j2.xml

conductor.workflow-monitor.enabled=true #Disable default conductor.workflow-reconciler.enabled=false conductor.workflow-repair-service.enabled=false

conductor.app.systemTaskWorkerPollInterval=1 conductor.app.systemTaskMaxPollCount=10 conductor.app.systemTaskWorkerThreadCount=10

conductor.app.maxTaskOutputPayloadSizeThreshold=102400 conductor.app.maxTaskInputPayloadSizeThreshold=102400 conductor.app.taskOutputPayloadSizeThreshold=102400 conductor.app.taskInputPayloadSizeThreshold=

conductor.app.sweeperThreadCount=10 conductor.sweep-frequency.millis=1`

lianjunwei avatar Oct 13 '22 04:10 lianjunwei

I found in the pressure test that when the qps of a single workflow is 10, it performs well and takes about 1s to run. However, at 20, it took more than 1 minute; When the qps is 30, it takes more than 2 minutes.

I presume that you are referring to the rcc_getDeviceNetworkStatus task. If its a SIMPLE, please check if you are increasing the worker threads (or instances) as you increase the number of workflows. If its a system task, consider tweaking the values for

conductor.app.systemTaskMaxPollCount conductor.app.systemTaskWorkerThreadCount

aravindanr avatar Oct 17 '22 21:10 aravindanr