keda icon indicating copy to clipboard operation
keda copied to clipboard

keda not creating pod after 2nd message in sqs queue

Open manjurshaikh1988 opened this issue 2 weeks ago • 0 comments

Report

i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod

`# https://keda.sh/docs/2.13/concepts/scaling-jobs/

apiVersion: v1 kind: Secret metadata: name: keda-sqs-auth namespace: backend type: Opaque
data: #awsRoleArn: "xxxxx " #echo -n "arn:aws:iam::xxx:role/keda-uat" | base64 AWS_ACCESS_KEY_ID:xxxxx # Required. AWS_SECRET_ACCESS_KEY:xxxxx # Required.

apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-aws-credentials namespace: backend spec: secretTargetRef:

  • parameter: awsAccessKeyID # Required. name: keda-sqs-auth # Required. key: AWS_ACCESS_KEY_ID # Required.
  • parameter: awsSecretAccessKey # Required. name: keda-sqs-auth # Required. key: AWS_SECRET_ACCESS_KEY # Required.

apiVersion: keda.sh/v1alpha1 kind: ScaledJob metadata: name: unified-sqs-queue-scaledjob namespace: backend spec: jobTargetRef: #parallelism: 2 # max number of desired pods #completions: 1 # desired number of successfully finished pods #activeDeadlineSeconds: 3600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer backoffLimit: 0 # Specifies the number of retries before marking this job failed. Defaults to 6 activeDeadlineSeconds: 16200 #900 template: metadata: labels: app: unified annotations: # Add toleration for GPU SKU, preventing scheduling on nodes with the specified GPU SKU. scheduler.alpha.kubernetes.io/tolerate-until-node-unschedulable: "true" spec: restartPolicy: Never # Prevent pods from restarting affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodegroup ##k get nodes --show-labels operator: In values: - gpu
podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - unified topologyKey: kubernetes.io/hostname tolerations: # Tolerate nodes with GPU SKU. - key: "dedicated" operator: "Equal" value: "gpupool" #gpupool-apppool effect: "NoSchedule" serviceAccountName: s3irsa terminationGracePeriodSeconds: 600 # time in seconds before terminating the pod gracefully after it receives a completion message containers: - name: unified image: xxx.dkr.ecr.ap-south-1.amazonaws.com/xx-unified:keda imagePullPolicy: Always env: - name: ALLOW_EMPTY_PASSWORD value: "yes" volumeMounts: - name: aws mountPath: /training resources: # requests: # cpu: 7000m

memory: 20000Mi

limits:

cpu: 2500m

memory: 20000Mi

        ports:
          - containerPort: 5000
            protocol: TCP
            name: unified
    volumes:
    - name: aws
      persistentVolumeClaim:
        #claimName: uat-training
        #claimName: s3-uatdatabs
        claimName: uat-efs

pollingInterval: 30 # How often KEDA will check the SQS queue minReplicaCount: 0 # Minimum number of jobs that KEDA can create #maxReplicaCount: 1 # Maximum number of jobs that KEDA can create successfulJobsHistoryLimit: 2 # Number of successful jobs to keep failedJobsHistoryLimit: 2 # Number of failed jobs to keep

scalingStrategy:

strategy: "accurate" #"default" # Scaling strategy (default, custom, or accurate)

pendingPodConditions:

- "Pending"

- "ContainerCreating"

triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.ap-south-1.amazonaws.com/xxxx/xx-unifiedservice.fifo queueLength: "1" awsRegion: "ap-south-1" scaleOnInFlight: "false" authenticationRef: name: keda-trigger-auth-aws-credentials # Ensure this references your actual AWS credentials stored in K8s secrets `

Expected Behavior

after second message in sqs keda should create the 2nd pod

Actual Behavior

i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod

Steps to Reproduce the Problem

  1. send 1st message in sqs
  2. check pod is getting created or not
  3. send 2nd message in sqs
  4. check pod should be created.

Logs from KEDA operator

manjur@MacBook-Pro keda % kubectl logs -f keda-operator-7f5d566f89-2fk22 
2024/06/21 11:58:45 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2024-06-21T11:58:45Z	INFO	setup	Starting manager
2024-06-21T11:58:45Z	INFO	setup	KEDA Version: 2.12.1
2024-06-21T11:58:45Z	INFO	setup	Git Commit: dc76ca70f19c22e8f0c806f84d95256d771f3dc9
2024-06-21T11:58:45Z	INFO	setup	Go Version: go1.20.8
2024-06-21T11:58:45Z	INFO	setup	Go OS/Arch: linux/amd64
2024-06-21T11:58:45Z	INFO	setup	Running on Kubernetes 1.28+	{"version": "v1.28.9-eks-036c24b"}
2024-06-21T11:58:45Z	INFO	starting server	{"kind": "health probe", "addr": "[::]:8081"}
I0621 11:58:45.933781       1 leaderelection.go:250] attempting to acquire leader lease keda-uat/operator.keda.sh...
2024-06-21T11:58:45Z	INFO	controller-runtime.metrics	Starting metrics server
2024-06-21T11:58:45Z	INFO	controller-runtime.metrics	Serving metrics server	{"bindAddress": ":8080", "secure": false}
I0621 11:59:23.015266       1 leaderelection.go:260] successfully acquired lease keda-uat/operator.keda.sh
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2024-06-21T11:59:23Z	INFO	Starting Controller	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2024-06-21T11:59:23Z	INFO	Starting Controller	{"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2024-06-21T11:59:23Z	INFO	Starting Controller	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z	INFO	Starting Controller	{"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z	INFO	Starting EventSource	{"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z	INFO	Starting Controller	{"controller": "cert-rotator"}
2024-06-21T11:59:23Z	INFO	cert-rotation	starting cert rotator controller
2024-06-21T11:59:23Z	INFO	cert-rotation	no cert refresh needed
2024-06-21T11:59:23Z	INFO	cert-rotation	certs are ready in /certs
2024-06-21T11:59:23Z	INFO	Starting workers	{"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z	INFO	Starting workers	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2024-06-21T11:59:23Z	INFO	Starting workers	{"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z	INFO	Starting workers	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2024-06-21T11:59:23Z	INFO	Reconciling ScaledJob	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z	INFO	Starting workers	{"controller": "cert-rotator", "worker count": 1}
2024-06-21T11:59:23Z	INFO	cert-rotation	no cert refresh needed
2024-06-21T11:59:23Z	INFO	cert-rotation	Ensuring CA cert	{"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z	INFO	cert-rotation	Ensuring CA cert	{"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z	INFO	cert-rotation	no cert refresh needed
2024-06-21T11:59:23Z	INFO	cert-rotation	Ensuring CA cert	{"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z	INFO	cert-rotation	Ensuring CA cert	{"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z	INFO	RolloutStrategy: immediate, Deleting jobs owned by the previous version of the scaledJob	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0", "numJobsToDelete": 3}
2024-06-21T11:59:23Z	INFO	Initializing Scaling logic according to ScaledJob Specification	{"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 0}
2024-06-21T11:59:23Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 0}
2024-06-21T11:59:23Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 1}
2024-06-21T11:59:23Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:23Z	INFO	scaleexecutor	Created jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:24Z	INFO	cert-rotation	CA certs are injected to webhooks
2024-06-21T11:59:24Z	INFO	grpc_server	Starting Metrics Service gRPC Server	{"address": ":9666"}
2024-06-21T11:59:53Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T11:59:53Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T11:59:53Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T11:59:53Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T11:59:53Z	INFO	scaleexecutor	Created jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:23Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:23Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:23Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z	INFO	scaleexecutor	Created jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:53Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:53Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:53Z	INFO	scaleexecutor	Creating jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z	INFO	scaleexecutor	Created jobs	{"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}

KEDA Version

2.12.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

AWS SQS

Anything else?

No response

manjurshaikh1988 avatar Jun 21 '24 12:06 manjurshaikh1988