docker-selenium
docker-selenium copied to clipboard
[🐛 Bug]: Selenium-grid Autoscaling setup
What happened?
I have implemented Selenium-grid autoscaling on our AKS cluster using deployment files. Currently we have selenium-hub, selenium-node-chrome. Autoscaling is not enabled yet. We have two pods of selenium-node-chrome and We try to run three threads against these node-chrome pods, two get executed and one stays in the queue and fails after sometime. Its fair as autoscaling is not enabled and No. of concurrent sessions is set to "1". I am having a hard time understanding on how to setup this Autoscaling using the KEDA. Is there any clear documentation to how the autoscaling can be setup.
Command used to start Selenium Grid with Docker (or Kubernetes)
N/A
Relevant log output
N/A
Operating System
Linux
Docker Selenium version (image tag)
4.18.0
Selenium Grid chart version (chart version)
No response
@katukna, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
Hi, are you deploying on the AKS cluster using your own YAML manifest files? Can you refer to these YAML to see any clues? - https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.23.1-20240813
Yes @VietND96 ... I was using my own yaml files. I have gone through the documentation README but I didn't get much info on how node-chrome autoscaling is based on. Is it done based on number of queue's which appears on the selenium-hub UI?
I have selenium-hub, selenium-node-chrome deployments and service files and ScaledObject. Attached the yaml files
selenium-hub.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: selenium-hub
name: selenium-hub
namespace: selenium-grid
spec:
replicas: 1
selector:
matchLabels:
app: selenium-hub
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: selenium-hub
spec:
containers:
- image: selenium/hub:4.23.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /wd/hub/status
port: 4444
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: selenium-hub
ports:
- containerPort: 4444
protocol: TCP
- containerPort: 4443
protocol: TCP
- containerPort: 4442
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /wd/hub/status
port: 4444
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 1000Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
selenium-hub-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: selenium-hub
name: selenium-hub
namespace: selenium-grid
spec:
ports:
- name: port0
port: 4444
protocol: TCP
targetPort: 4444
- name: port1
port: 4443
protocol: TCP
targetPort: 4443
- name: port2
port: 4442
protocol: TCP
targetPort: 4442
- name: node
port: 5555
protocol: TCP
targetPort: 5555
- name: port3
port: 80
protocol: TCP
targetPort: 80
selector:
app: selenium-hub
sessionAffinity: None
type: ClusterIP
selenium-node-chrome-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: selenium-node-chrome
name: selenium-node-chrome
namespace: selenium-grid
spec:
replicas: 2
selector:
matchLabels:
app: selenium-node-chrome
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: selenium-node-chrome
spec:
containers:
- env:
- name: SE_EVENT_BUS_HOST
value: selenium-hub
- name: SE_EVENT_BUS_SUBSCRIBE_PORT
value: "4443"
- name: SE_EVENT_BUS_PUBLISH_PORT
value: "4442"
image: selenium/node-chrome:4.23.1
imagePullPolicy: IfNotPresent
name: selenium-node-chrome
ports:
- containerPort: 5555
protocol: TCP
resources:
limits:
cpu: 500m
memory: 1000Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: dshm
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- emptyDir:
medium: Memory
selenium-node-chrome-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
name: selenium-node-chrome
name: selenium-node-chrome
namespace: selenium-grid
spec:
ports:
- name: nodeport
port: 5555
protocol: TCP
targetPort: 5555
- name: node-port-grid
port: 4444
protocol: TCP
targetPort: 4444
- name: no-vnc
port: 7900
protocol: TCP
targetPort: 7900
selector:
app: selenium-node-chrome
sessionAffinity: None
type: ClusterIP
Chrome-ScaledObject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
namespace: selenium-grid
name: chrome-scale-deployment
labels:
deploymentName: selenium-node-chrome
spec:
minReplicaCount: 2
maxReplicaCount: 5
scaleTargetRef:
name: selenium-node-chrome
triggers:
- type: selenium-grid
metadata:
url: 'https://selenium-hub.example.com:4444/graphql'
browserName: 'chrome'
unsafeSsl : 'true'
@katukna Have you resolved your issue, or have you found a solution? Please share as I'm encountering this issue in my AKS. Thanks
Not yet @edsherwin ... Let me know if you found out a way to do it?
In Chrome-ScaledObject.yaml, can you try to update metadata.url point to hub svc (e.g svc_name.namespace) instead of public dns/loadbalancer IP with https. For example
triggers:
- type: selenium-grid
metadata:
url: 'http://selenium-hub.selenium-grid:4444/graphql'
browserName: 'chrome'
unsafeSsl : 'true'
There were few fixes recently on autoscaling with KEDA, in the scaler logic. You can refer to this https://github.com/SeleniumHQ/docker-selenium/tree/trunk/.keda to preview the fix and verify