docker-selenium
docker-selenium copied to clipboard
[🐛 Bug]: Nodes couldn't active when enabling autoscaling and deployed on EKS
What happened?
When we enable autoscaling in helm chart, It doesn't work properly.
I'm using selenium grid helm chart on EKS. It works without autoscaling enabled But when I enable autoscaling, I couldn't see any active nodes in selenium.
Command used to start Selenium Grid with Docker (or Kubernetes)
value.yml for helm charts
hub:
serviceType: NodePort
autoscaling:
enabled: true
ingress:
enabled: true
nginx: !
annotations:
"kubernetes.io/ingress.class": "alb"
"alb.ingress.kubernetes.io/scheme": "internal"
"alb.ingress.kubernetes.io/group.name": "alb-name"
"alb.ingress.kubernetes.io/group.order": "300"
"alb.ingress.kubernetes.io/listen-ports": "[{\"HTTPS\":443}, {\"HTTP\":80}]"
"alb.ingress.kubernetes.io/ssl-redirect": "443"
"alb.ingress.kubernetes.io/healthcheck-port": "8080"
"alb.ingress.kubernetes.io/certificate-arn": "certificate-arn"
Relevant log output
kubectl logs keda-operator-bf9546dd-km68s
...
2024-04-28T18:48:30Z ERROR cert-rotation Webhook not found. Unable to update certificate. {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission\" not found"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:816
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2024-04-28T18:48:30Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-04-28T18:48:30Z INFO cert-rotation no cert refresh needed
2024-04-28T18:48:30Z ERROR cert-rotation Webhook not found. Unable to update certificate. {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission\" not found"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:816
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2024-04-28T18:48:30Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-04-28T18:48:32Z INFO cert-rotation CA certs are injected to webhooks
...
2024-04-28T18:48:42Z ERROR scaleexecutor failed to patch Objects {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/status.TransformObject
/workspace/pkg/status/status.go:195
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setCondition
/workspace/pkg/scaling/executor/scale_executor.go:106
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setActiveCondition
/workspace/pkg/scaling/executor/scale_executor.go:120
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:76
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
2024-04-28T18:48:42Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:77
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
...
2024-04-28T18:48:44Z ERROR scaleexecutor failed to patch Objects {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/status.TransformObject
/workspace/pkg/status/status.go:195
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setCondition
/workspace/pkg/scaling/executor/scale_executor.go:106
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setActiveCondition
/workspace/pkg/scaling/executor/scale_executor.go:120
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:76
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
2024-04-28T18:48:44Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:77
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
...
2024-04-28T18:48:45Z ERROR scaleexecutor failed to patch Objects {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/status.TransformObject
/workspace/pkg/status/status.go:195
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setCondition
/workspace/pkg/scaling/executor/scale_executor.go:106
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).setActiveCondition
/workspace/pkg/scaling/executor/scale_executor.go:120
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:76
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
2024-04-28T18:48:45Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium", "error": "client rate limiter Wait returned an error: context canceled"}
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestJobScale
/workspace/pkg/scaling/executor/scale_jobs.go:77
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:263
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:182
Operating System
Kubernetes, EKS
Docker Selenium version (image tag)
4.20.0-20240425
Selenium Grid chart version (chart version)
0.30.0
@fazizsoltani, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
May I know if it could work before and it is broken after upgrading new chart version?
No, I had problems with previous versions too.
I saw something could relate to cert, SSL connection
2024-04-28T18:48:30Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-04-28T18:48:32Z INFO cert-rotation CA certs are injected to webhooks
I also saw the config Hub using NodePort
hub:
serviceType: NodePort
Can you exec kubectl describe scaledJob to see details of a node scaledjob, I want to see section
triggers:
- type: selenium-grid
metadata:
...
kubectl describe scaledJob selenium-chrome-node
Namespace: selenium
Labels: app=selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.20.0-20240425
app.kubernetes.io/instance=selenium
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-chrome-node
app.kubernetes.io/version=4.20.0-20240425
component.autoscaling=true
helm.sh/chart=selenium-grid-0.30.0
Annotations: helm.sh/hook: post-install,post-upgrade,post-rollback,pre-delete
API Version: keda.sh/v1alpha1
Kind: ScaledJob
Metadata:
Creation Timestamp: 2024-04-28T18:48:42Z
Finalizers:
finalizer.keda.sh
Generation: 3
Managed Fields:
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"finalizer.keda.sh":
f:spec:
f:rollout:
Manager: keda
Operation: Update
Time: 2024-04-28T18:48:42Z
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:helm.sh/hook:
f:labels:
.:
f:app:
f:app.kubernetes.io/component:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/name:
f:app.kubernetes.io/version:
f:component.autoscaling:
f:helm.sh/chart:
f:spec:
.:
f:failedJobsHistoryLimit:
f:jobTargetRef:
.:
f:backoffLimit:
f:completions:
f:parallelism:
f:template:
.:
f:metadata:
.:
f:annotations:
.:
f:checksum/event-bus-configmap:
f:checksum/logging-configmap:
f:checksum/node-configmap:
f:checksum/server-configmap:
f:labels:
.:
f:app:
f:app.kubernetes.io/component:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/name:
f:app.kubernetes.io/version:
f:helm.sh/chart:
f:spec:
.:
f:containers:
f:restartPolicy:
f:serviceAccount:
f:serviceAccountName:
f:terminationGracePeriodSeconds:
f:volumes:
f:maxReplicaCount:
f:minReplicaCount:
f:pollingInterval:
f:scalingStrategy:
.:
f:strategy:
f:successfulJobsHistoryLimit:
f:triggers:
Manager: terraform-provider-helm_v2.11.0_x5
Operation: Update
Time: 2024-04-28T18:48:42Z
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
Manager: keda
Operation: Update
Subresource: status
Time: 2024-04-28T18:48:53Z
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:kubectl.kubernetes.io/last-applied-configuration:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2024-04-28T18:48:53Z
Resource Version: 108763431
UID: 0b1ddbc0-0e66-4849-9e35-5de9dca3179a
Spec:
Failed Jobs History Limit: 0
Job Target Ref:
Backoff Limit: 0
Completions: 1
Parallelism: 1
Template:
Metadata:
Annotations:
checksum/event-bus-configmap: 4e264bd45e78bf454c38
checksum/logging-configmap: c7f18f9e715bc62bca7234
checksum/node-configmap: bd257694e2cfebd395a9
checksum/server-configmap: 4af2ca96bbaebd763d5
Labels:
App: selenium-chrome-node
app.kubernetes.io/component: selenium-grid-4.20.0-20240425
app.kubernetes.io/instance: selenium
app.kubernetes.io/managed-by: helm
app.kubernetes.io/name: selenium-chrome-node
app.kubernetes.io/version: 4.20.0-20240425
helm.sh/chart: selenium-grid-0.30.0
Spec:
Containers:
Env:
Name: SE_OTEL_SERVICE_NAME
Value: selenium-chrome-node
Name: SE_NODE_PORT
Value: 5555
Name: SE_NODE_REGISTER_PERIOD
Value: 60
Name: SE_NODE_REGISTER_CYCLE
Value: 5
Env From:
Config Map Ref:
Name: selenium-event-bus
Config Map Ref:
Name: selenium-node-config
Config Map Ref:
Name: selenium-logging-config
Config Map Ref:
Name: selenium-server-config
Secret Ref:
Name: selenium-secrets
Image: selenium/node-chrome:4.20.0-20240425
Image Pull Policy: IfNotPresent
Lifecycle:
Pre Stop:
Exec:
Command:
bash
-c
/opt/selenium/nodePreStop.sh
Name: selenium-chrome-node
Ports:
Container Port: 5555
Protocol: TCP
Resources:
Limits:
Cpu: 1
Memory: 1Gi
Requests:
Cpu: 1
Memory: 1Gi
Startup Probe:
Exec:
Command:
bash
-c
/opt/selenium/nodeProbe.sh Startup
Failure Threshold: 12
Period Seconds: 5
Success Threshold: 1
Timeout Seconds: 60
Volume Mounts:
Mount Path: /dev/shm
Name: dshm
Mount Path: /opt/selenium/nodePreStop.sh
Name: selenium-node-config
Sub Path: nodePreStop.sh
Mount Path: /opt/selenium/nodeProbe.sh
Name: selenium-node-config
Sub Path: nodeProbe.sh
Restart Policy: Never
Service Account: selenium-serviceaccount
Service Account Name: selenium-serviceaccount
Termination Grace Period Seconds: 30
Volumes:
Config Map:
Default Mode: 493
Name: selenium-node-config
Name: selenium-node-config
Empty Dir:
Medium: Memory
Size Limit: 1Gi
Name: dshm
Max Replica Count: 8
Min Replica Count: 0
Polling Interval: 10
Rollout:
Scaling Strategy:
Strategy: accurate
Successful Jobs History Limit: 0
Triggers:
Metadata:
Browser Name: chrome
Platform Name: linux
Session Browser Name: chrome
Trigger Index: 0
Unsafe Ssl: true
URL: http://admin:[email protected]:4444/graphql
Type: selenium-grid
Status:
Conditions:
Message: ScaledJob is defined correctly and is ready to scaling
Reason: ScaledJobReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Status: Unknown
Type: Fallback
Status: Unknown
Type: Paused
Events: <none>```