conductor
conductor copied to clipboard
Conductor Radis Sentinel Configuration not queueing work
Discussed in https://github.com/Netflix/conductor/discussions/3061
Originally posted by ZergRushJoe June 22, 2022 I have been trying to move over are conductor instance to redis sentinel. my current config is
# Servers.
conductor.grpc-server.enabled=false
# Database persistence type.
conductor.db.type=redis_sentinel
conductor.redis.hosts=conductor-redis-node-0.conductor-redis-headless.conductor-playground.svc.cluster.local:26379:cluster:**********;conductor-redis-node-1.conductor-redis-headless.conductor-playground.svc.cluster.local:26379:cluster;conductor-redis-node-2.conductor-redis-headless.conductor-playground.svc.cluster.local:26379:cluster
conductor.redis.clusterName=mymaster
# Namespace for the keys stored in Dynomite/Redis
conductor.redis.workflowNamespacePrefix=conductor
# Namespace prefix for the dyno queues
conductor.redis.queueNamespacePrefix=conductor_queues
# Hikari pool sizes are -1 by default and prevent startup
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=2
# Elastic search instance indexing is enabled.
conductor.indexing.enabled=true
# Transport address to elasticsearch
conductor.elasticsearch.url=http://conductor-elasticsearch.conductor-playground.svc.cluster.local:9200
# Name of the elasticsearch cluster
conductor.elasticsearch.indexName=conductor
# Yellow is main cluster node is up and running for elasticsearch
conductor.elasticsearch.clusterHealthColor=yellow
the connection string is formatted as follows
I been getting this info log in conductor itself:
554563 [pool-26-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
614067 [pool-21-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
614567 [pool-27-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
614567 [pool-25-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
614567 [pool-26-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
The instance does not seem to be queuing work to be done. system task just remain in progress forever. Does anyone have any ideas on what i'm doing wrong.
This is on conductor 3.10.3 version
Debug logs:
292333 [scheduled-task-pool-2] DEBUG com.netflix.conductor.core.reconciliation.WorkflowReconciler [] - Sweeper processed from the decider queue
292341 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:HTTP, got 0 tasks
292341 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:SUB_WORKFLOW, got 0 tasks
292341 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:START_WORKFLOW, got 0 tasks
292391 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: HTTP with 1 slots acquired
292391 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: SUB_WORKFLOW with 1 slots acquired
292391 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: START_WORKFLOW with 1 slots acquired
292591 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:HTTP, got 0 tasks
292592 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:SUB_WORKFLOW, got 0 tasks
292592 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:START_WORKFLOW, got 0 tasks
292642 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: HTTP with 1 slots acquired
292642 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: SUB_WORKFLOW with 1 slots acquired
292642 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: START_WORKFLOW with 1 slots acquired
292842 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:SUB_WORKFLOW, got 0 tasks
292842 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:HTTP, got 0 tasks
292842 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:START_WORKFLOW, got 0 tasks
292892 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: SUB_WORKFLOW with 1 slots acquired
292893 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: HTTP with 1 slots acquired
292893 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: START_WORKFLOW with 1 slots acquired
293093 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:SUB_WORKFLOW, got 0 tasks
293093 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:START_WORKFLOW, got 0 tasks
293094 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:HTTP, got 0 tasks
293144 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: SUB_WORKFLOW with 1 slots acquired
293144 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: START_WORKFLOW with 1 slots acquired
293144 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: HTTP with 1 slots acquired
293344 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:SUB_WORKFLOW, got 0 tasks
293344 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:HTTP, got 0 tasks
293344 [pool-17-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue:START_WORKFLOW, got 0 tasks
293394 [pool-18-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: HTTP with 1 slots acquired
293394 [pool-16-thread-1] DEBUG com.netflix.conductor.core.execution.tasks.SystemTaskWorker [] - Polling queue: SUB_WORKFLOW with 1 slots acquired
Added redis locking to see if that was the problem but got a new error
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.fasterxml.jackson.module.afterburner.util.MyClassLoader (jar:file:/app/libs/conductor-server-3.11.0-SNAPSHOT-boot.jar!/BOOT-INF/lib/jackson-module-afterburner-2.13.3.jar!/) to method java.lang.ClassLoader.findLoadedClass(java.lang.String)
WARNING: Please consider reporting this to the maintainers of com.fasterxml.jackson.module.afterburner.util.MyClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
72045 [pool-21-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
72515 [pool-25-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
72515 [pool-26-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
72515 [pool-27-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
92957 [pool-29-thread-1] INFO com.netflix.dyno.queues.redis.RedisDynoQueue [] - processUnacks() will NOT be atomic.
settings
#Redis cluster settings for locking module
conductor.redis-lock.serverType=sentinel
#Comma separated list of server nodes
conductor.redis-lock.serverAddress={{ $redisLockConnectionString }}
#Redis sentinel master name
conductor.redis-lock.serverMasterName=mymaster
conductor.redis-lock.namespace =conductor_locks
conductor.redis-lock.serverPassword={{ $redisPassword }}
# Namespace for the keys stored in Dynomite/Redis
conductor.redis.workflowNamespacePrefix=conductor
@ZergRushJoe I don't see any issues with the setup. Your logs do not point to any errors either. The system task poller seems to be polling the queues actively but does not seem to dequeue any tasks. Would you be able to login to your redis instance and check if the messages are being populated? Additionally, could you also post your workflow definition?
I am experiencing the exact same behavior on redis_standalone, I checked the Redis keys and I see them, but while polling nothing is found:
10.197.33.28:6379> KEYS *
- "workflows.production.general.WORKFLOW_DEF.send-batch-messages"
- "workflows.production.general.CORR_ID_TO_WORKFLOWS.63a8dc39-d460-4534-88f2-168e51b9d8f7-0"
- "workflows.production.general.SCHEDULED_TASKS.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7"
- "workflows.production.general.WORKFLOW_DEF_TO_WORKFLOWS.send-batch-messages.20220821"
- "workflows.production.general.WORKFLOW.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7"
- "workflows.production.general.WORKFLOW_TO_TASKS.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7"
- "workflows.production.general.TASK_DEFS"
- "queues.production.general.QUEUE._deciderQueue.1"
- "workflows.production.general.IN_PROGRESS_TASKS.clear-blacklisted-recipients"
- "queues.production.general.MESSAGE.HTTP"
- "workflows.production.general.TASK.eb64a922-b38e-46a1-b125-bde02abd1eaf"
- "workflows.production.general.PENDING_WORKFLOWS.send-batch-messages"
- "queues.production.general.QUEUE.HTTP.1"
- "queues.production.general.MESSAGE._deciderQueue"
- "workflows.production.general.WORKFLOW_DEF_NAMES"
- "workflows.production.general.WORKFLOW_DEF.finalize-with-retries-and-publish"
Same in my redis cluster. It gets into redis just never de queues anything
On Sun, Aug 21, 2022, 5:12 PM arielzadi @.***> wrote:
I am experiencing the exact same behavior on redis_standalone, I checked the Redis keys and I see them, but while polling nothing is found:
10.197.33.28:6379> KEYS *
- "workflows.production.general.WORKFLOW_DEF.send-batch-messages"
"workflows.production.general.CORR_ID_TO_WORKFLOWS.63a8dc39-d460-4534-88f2-168e51b9d8f7-0" 3. "workflows.production.general.SCHEDULED_TASKS.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7" 4. "workflows.production.general.WORKFLOW_DEF_TO_WORKFLOWS.send-batch-messages.20220821" 5. "workflows.production.general.WORKFLOW.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7" 6. "workflows.production.general.WORKFLOW_TO_TASKS.838cdf35-ff3a-4eb6-b6c3-83915c1d8ab7" 7. "workflows.production.general.TASK_DEFS" 8. "queues.production.general.QUEUE._deciderQueue.1" 9. "workflows.production.general.IN_PROGRESS_TASKS.clear-blacklisted-recipients" 10. "queues.production.general.MESSAGE.HTTP" 11. "workflows.production.general.TASK.eb64a922-b38e-46a1-b125-bde02abd1eaf" 12. "workflows.production.general.PENDING_WORKFLOWS.send-batch-messages" 13. "queues.production.general.QUEUE.HTTP.1" 14. "queues.production.general.MESSAGE._deciderQueue" 15. "workflows.production.general.WORKFLOW_DEF_NAMES" 16. "workflows.production.general.WORKFLOW_DEF.finalize-with-retries-and-publish"
— Reply to this email directly, view it on GitHub https://github.com/Netflix/conductor/issues/3170#issuecomment-1221621882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSM4YVFN44E2572ZVM7ELV2KLUZANCNFSM56EFPHTA . You are receiving this because you were mentioned.Message ID: @.***>
This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.
This issue was closed, because it has been stalled for 7 days with no activity.