keda
keda copied to clipboard
keda not creating pod after 2nd message in sqs queue
Report
i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod
`# https://keda.sh/docs/2.13/concepts/scaling-jobs/
apiVersion: v1
kind: Secret
metadata:
name: keda-sqs-auth
namespace: backend
type: Opaque
data:
#awsRoleArn: "xxxxx
" #echo -n "arn:aws:iam::xxx:role/keda-uat" | base64
AWS_ACCESS_KEY_ID:xxxxx # Required.
AWS_SECRET_ACCESS_KEY:xxxxx # Required.
apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-aws-credentials namespace: backend spec: secretTargetRef:
- parameter: awsAccessKeyID # Required. name: keda-sqs-auth # Required. key: AWS_ACCESS_KEY_ID # Required.
- parameter: awsSecretAccessKey # Required. name: keda-sqs-auth # Required. key: AWS_SECRET_ACCESS_KEY # Required.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: unified-sqs-queue-scaledjob
namespace: backend
spec:
jobTargetRef:
#parallelism: 2 # max number of desired pods
#completions: 1 # desired number of successfully finished pods
#activeDeadlineSeconds: 3600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
backoffLimit: 0 # Specifies the number of retries before marking this job failed. Defaults to 6
activeDeadlineSeconds: 16200 #900
template:
metadata:
labels:
app: unified
annotations:
# Add toleration for GPU SKU, preventing scheduling on nodes with the specified GPU SKU.
scheduler.alpha.kubernetes.io/tolerate-until-node-unschedulable: "true"
spec:
restartPolicy: Never # Prevent pods from restarting
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodegroup ##k get nodes --show-labels
operator: In
values:
- gpu
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- unified
topologyKey: kubernetes.io/hostname
tolerations:
# Tolerate nodes with GPU SKU.
- key: "dedicated"
operator: "Equal"
value: "gpupool" #gpupool-apppool
effect: "NoSchedule"
serviceAccountName: s3irsa
terminationGracePeriodSeconds: 600 # time in seconds before terminating the pod gracefully after it receives a completion message
containers:
- name: unified
image: xxx.dkr.ecr.ap-south-1.amazonaws.com/xx-unified:keda
imagePullPolicy: Always
env:
- name: ALLOW_EMPTY_PASSWORD
value: "yes"
volumeMounts:
- name: aws
mountPath: /training
resources:
# requests:
# cpu: 7000m
memory: 20000Mi
limits:
cpu: 2500m
memory: 20000Mi
ports:
- containerPort: 5000
protocol: TCP
name: unified
volumes:
- name: aws
persistentVolumeClaim:
#claimName: uat-training
#claimName: s3-uatdatabs
claimName: uat-efs
pollingInterval: 30 # How often KEDA will check the SQS queue minReplicaCount: 0 # Minimum number of jobs that KEDA can create #maxReplicaCount: 1 # Maximum number of jobs that KEDA can create successfulJobsHistoryLimit: 2 # Number of successful jobs to keep failedJobsHistoryLimit: 2 # Number of failed jobs to keep
scalingStrategy:
strategy: "accurate" #"default" # Scaling strategy (default, custom, or accurate)
pendingPodConditions:
- "Pending"
- "ContainerCreating"
triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.ap-south-1.amazonaws.com/xxxx/xx-unifiedservice.fifo queueLength: "1" awsRegion: "ap-south-1" scaleOnInFlight: "false" authenticationRef: name: keda-trigger-auth-aws-credentials # Ensure this references your actual AWS credentials stored in K8s secrets `
Expected Behavior
after second message in sqs keda should create the 2nd pod
Actual Behavior
i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod
Steps to Reproduce the Problem
- send 1st message in sqs
- check pod is getting created or not
- send 2nd message in sqs
- check pod should be created.
Logs from KEDA operator
manjur@MacBook-Pro keda % kubectl logs -f keda-operator-7f5d566f89-2fk22
2024/06/21 11:58:45 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2024-06-21T11:58:45Z INFO setup Starting manager
2024-06-21T11:58:45Z INFO setup KEDA Version: 2.12.1
2024-06-21T11:58:45Z INFO setup Git Commit: dc76ca70f19c22e8f0c806f84d95256d771f3dc9
2024-06-21T11:58:45Z INFO setup Go Version: go1.20.8
2024-06-21T11:58:45Z INFO setup Go OS/Arch: linux/amd64
2024-06-21T11:58:45Z INFO setup Running on Kubernetes 1.28+ {"version": "v1.28.9-eks-036c24b"}
2024-06-21T11:58:45Z INFO starting server {"kind": "health probe", "addr": "[::]:8081"}
I0621 11:58:45.933781 1 leaderelection.go:250] attempting to acquire leader lease keda-uat/operator.keda.sh...
2024-06-21T11:58:45Z INFO controller-runtime.metrics Starting metrics server
2024-06-21T11:58:45Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
I0621 11:59:23.015266 1 leaderelection.go:260] successfully acquired lease keda-uat/operator.keda.sh
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "cert-rotator"}
2024-06-21T11:59:23Z INFO cert-rotation starting cert rotator controller
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation certs are ready in /certs
2024-06-21T11:59:23Z INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2024-06-21T11:59:23Z INFO Reconciling ScaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "cert-rotator", "worker count": 1}
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z INFO RolloutStrategy: immediate, Deleting jobs owned by the previous version of the scaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0", "numJobsToDelete": 3}
2024-06-21T11:59:23Z INFO Initializing Scaling logic according to ScaledJob Specification {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 0}
2024-06-21T11:59:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 0}
2024-06-21T11:59:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 1}
2024-06-21T11:59:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:23Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:24Z INFO cert-rotation CA certs are injected to webhooks
2024-06-21T11:59:24Z INFO grpc_server Starting Metrics Service gRPC Server {"address": ":9666"}
2024-06-21T11:59:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T11:59:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T11:59:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T11:59:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T11:59:53Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
KEDA Version
2.12.1
Kubernetes Version
1.28
Platform
Amazon Web Services
Scaler Details
AWS SQS
Anything else?
No response