keda-docs
keda-docs copied to clipboard
better documentation around scaling strategy
I'm using ScaledJob and I'm having a lot of confusion trying to understand the scaling strategies and how they differ.
my ScaledJob is triggered from an Azure Service Bus Queue and is configured like so:
job:
paused: "false"
activeDeadlineSeconds: 600
pollingInterval: 30
minReplicaCount: 0
maxReplicaCount: 3
successfulJobsHistoryLimit: 10
failedJobsHistoryLimit: 10
scalingStrategy: "eager"
trigger:
queueName: some-queue-name
messageCount: "1"
auth: my-cluster-trigger-auth
my goal is to have a ScaledJob defined that is triggered to run when messages land on the queue.....up to three Jobs running in parallel. My job:
- gets a message from the queue and locks it (at least, that's what my engineers are telling me)
- processes the message to completion
- "completes" the message (so it's no longer in the queue)
- exits cleanly
on the off chance the processing fails or the pod dies, the lock will expire (eventually) and a different job will be started to process the message again. Eventually, if no job can process the message, we'll hit the max delivery count and the message will be dead lettered.
with both accurate
and eager
strategies, when I drop a message on the queue, I see a job start within 30 seconds (as expected). Again, my understanding is that the message is locked...but.....
- thirty seconds later, after the next poll, another job starts up and tries to pull a message from the queue and just sits idle while blocking and waiting for a message
- another thirty seconds later, another job starts up and again, just sits idle blocking while waiting for a message
meanwhile, the only job actually doing any work is the first job, but now I'm at three running jobs....one processing a message and the other two just sitting around waiting. eventually either a message comes in and one of those two idle jobs will grab it, or no jobs come in and the job hits the activeDeadlineSeconds and appears as a Failed
job.
I see the same behavior when using accurate
, except after the idle jobs timeout, more jobs are started....meaning it appears like there are always three running jobs....even overnight while nothing is in the queue....every ten minutes one job "Fails" and another job starts..... With eager
, once the idle jobs timeout, new ones are not created while the queue is empty
also, in the docs for scaling strategy, I see:
accurate If the scaler returns queueLength (number of items in the queue) that does not include the number of locked messages, this strategy is recommended. Azure Storage Queue is one example. You can use this strategy if you delete a message once your app consumes it.
so my questions are:
- how exactly does one confirm if the scaler behaves this way?
- why do those jobs get started long after the first job actually pulled the message and started processing it?