mybinder.org-deploy
mybinder.org-deploy copied to clipboard
Streamline deployment of GESIS stage server
This is related to https://github.com/jupyterhub/mybinder.org-deploy/issues/2797
The configuration in the ansible folder is working and GitLab CI at .gitlab-ci.yml is also working.
I'm trying to complete the Kubernetes cluster configuration in the Helm chart.
@manics @sgibson91 @minrk could you help me to understand what Helm chart configuration is being loaded by mistake? The binder pod crashes with the following log
Loading /etc/binderhub/config/values.yaml
Loading extra config: 01-eventlog
Loading extra config: 01-template-variables
Loading extra config: 02-badge-base-url
Loading extra config: 02-event-loop-metric
[BinderHub] starting!
[BinderHub] WARNING | BinderHub.build_node_selector is deprecated, use KubernetesBuildExecutor.node_selector
[BinderHub] WARNING | BinderHub.build_docker_host is deprecated, use KubernetesBuildExecutor.docker_host
[W 240906 15:36:29 _metadata:139] Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
[W 240906 15:36:32 _metadata:139] Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
[W 240906 15:36:35 _metadata:139] Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
[W 240906 15:36:35 _default:338] Authentication failed using Compute Engine authentication due to unavailable metadata server.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/binderhub/__main__.py", line 4, in <module>
main()
File "/usr/local/lib/python3.11/site-packages/traitlets/config/application.py", line 1074, in launch_instance
app.initialize(argv)
File "/usr/local/lib/python3.11/site-packages/binderhub/app.py", line 913, in initialize
self.event_log = EventLog(parent=self)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/binderhub/events.py", line 51, in __init__
self.handlers = self.handlers_maker(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 18, in _make_eventsink_handler
File "/usr/local/lib/python3.11/site-packages/google/cloud/logging_v2/client.py", line 122, in __init__
super(Client, self).__init__(
File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 320, in __init__
_ClientProjectMixin.__init__(self, project=project, credentials=credentials)
File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 268, in __init__
project = self._determine_default(project)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 287, in _determine_default
return _determine_default_project(project)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/cloud/_helpers/__init__.py", line 152, in _determine_default_project
_, project = google.auth.default()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/auth/_default.py", line 691, in default
raise exceptions.DefaultCredentialsError(_CLOUD_SDK_MISSING_CREDENTIALS)
google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.
GESIS runs the BinderHub server on bare metal.
Thanks @manics for the reply and comments. I was able to disable the the attempt to contact Google Cloud with https://github.com/jupyterhub/mybinder.org-deploy/blob/19406517c562d779999f98e70a2f33eaa662dde1/config/curvenote.yaml#L252-L253
The problem that I have is that all persistent volume claims are pending.
kubectl get -n gesis pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
binderhub-grafana Pending 24h
binderhub-harbor-jobservice Pending 4d16h
binderhub-harbor-registry Pending 4d16h
binderhub-prometheus-server Pending standard 24h
data-binderhub-harbor-redis-0 Bound alertmanager 5Gi RWO 4d16h
data-binderhub-harbor-trivy-0 Pending 4d16h
database-data-binderhub-harbor-database-0 Pending 4d16h
hub-db-dir Pending 24h
I know that I need to declare a correct persistent volume but I can't find where the persistent volume is declared for OVH or CurveNote. @manics can you point me to the persistent volume declaration? Thanks!
I have the main pods running.
kubectl get -n gesis pods
NAME READY STATUS RESTARTS AGE
binder-7c84c576c-2689p 1/1 Running 0 80m
binderhub-cryptnono-c9hrj 2/2 Running 0 128m
binderhub-cryptnono-dgr4g 2/2 Running 0 128m
binderhub-cryptnono-hqpzf 2/2 Running 0 128m
binderhub-cryptnono-pbqlx 2/2 Running 0 128m
binderhub-dind-ntxvs 1/1 Running 0 80m
binderhub-grafana-9d48bc74-qtn4x 1/1 Running 0 62m
binderhub-image-cleaner-6zc9v 1/1 Running 0 80m
binderhub-ingress-nginx-controller-6fdbf98688-j29w2 1/1 Running 0 47m
binderhub-ingress-nginx-defaultbackend-5d698c868-qh5zx 1/1 Running 0 128m
binderhub-kube-state-metrics-8547b9d4dd-rr4tw 1/1 Running 0 128m
binderhub-prometheus-node-exporter-4dv2s 1/1 Running 0 128m
binderhub-prometheus-node-exporter-c8bv7 1/1 Running 0 128m
binderhub-prometheus-node-exporter-gkxcf 1/1 Running 0 128m
binderhub-prometheus-node-exporter-wfk7h 1/1 Running 0 128m
binderhub-prometheus-server-7c59dd5d85-fwbqm 2/2 Running 0 128m
hub-6564cd475f-nxltz 1/1 Running 0 13m
minesweeper-bf58z 0/1 ImagePullBackOff 0 128m
minesweeper-fkjd6 0/1 ImagePullBackOff 0 128m
minesweeper-t2fs8 0/1 ImagePullBackOff 0 128m
proxy-f5b566ddc-j7l9l 1/1 Running 0 80m
proxy-patches-85b5998bdb-9mjw9 1/1 Running 0 128m
static-6f64c6bc8-ndn2t 1/1 Running 0 128m
user-scheduler-55df956bcf-6b4m6 1/1 Running 0 80m
user-scheduler-55df956bcf-db79g 1/1 Running 0 80m
Ingress
The ingress is not working. The goal here is to have http://notebooks-test.gesis.org being answer by the NGINX Ingress pod. @manics can you help me?
ping -c 1 notebooks-test.gesis.org
PING notebooks-test.gesis.org (194.95.75.20) 56(84) bytes of data.
64 bytes from svko-css-backup-node.gesis.intra (194.95.75.20): icmp_seq=1 ttl=61 time=2.26 ms
--- notebooks-test.gesis.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.261/2.261/2.261/0.000 ms
kubectl -n gesis describe ingress binderhub
Name: binderhub
Labels: app.kubernetes.io/managed-by=Helm
Namespace: gesis
Address: 10.100.230.222
Ingress Class: <none>
Default backend: <default>
TLS:
kubelego-tls-binder-binderhub terminates notebooks-test.gesis.org
Rules:
Host Path Backends
---- ---- --------
notebooks-test.gesis.org
/ binder:80 (10.244.255.21:8585)
Annotations: kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: true
meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 54m nginx-ingress-controller Scheduled for sync
Normal Sync 54m nginx-ingress-controller Scheduled for sync
Normal Sync 53m nginx-ingress-controller Scheduled for sync
kubectl -n gesis describe service binderhub-ingress-nginx-controller
Name: binderhub-ingress-nginx-controller
Namespace: gesis
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=binderhub
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.11.2
helm.sh/chart=ingress-nginx-4.11.2
Annotations: meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=binderhub,app.kubernetes.io/name=ingress-nginx
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.230.222
IPs: 10.100.230.222
Port: http 80/TCP
TargetPort: http/TCP
Endpoints: 10.244.65.205:80
Port: https 443/TCP
TargetPort: https/TCP
Endpoints: 10.244.65.205:443
Session Affinity: None
Events: <none>
minesweeper
The image name is wrong. It is trying to pull jupyterhub/mybinder.org-minesweeper:set-by-chartpress.
kubectl -n gesis describe pod minesweeper-bf58z
Name: minesweeper-bf58z
Namespace: gesis
Priority: 0
Service Account: minesweeper
Node: svko-css-backup-node/194.95.75.20
Start Time: Tue, 10 Sep 2024 14:27:52 +0200
Labels: app=binder
component=minesweeper
controller-revision-hash=767d8795cc
heritage=Helm
name=minesweeper
pod-template-generation=1
release=binderhub
Annotations: checksum/configmap: 7a857debb16fa8bcb22a5de6418a5ff319c9e06f4cfc010705caec539b9614cc
cni.projectcalico.org/containerID: a3415f68c66691989387a7ea9bc5c6dd5cfa8039affee823adfd0a9b8f0b7263
cni.projectcalico.org/podIP: 10.244.65.206/32
cni.projectcalico.org/podIPs: 10.244.65.206/32
Status: Pending
IP: 10.244.65.206
IPs:
IP: 10.244.65.206
Controlled By: DaemonSet/minesweeper
Containers:
minesweeper:
Container ID:
Image: jupyterhub/mybinder.org-minesweeper:set-by-chartpress
Image ID:
Port: <none>
Host Port: <none>
Args:
python
/srv/minesweeper/minesweeper.py
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 250Mi
Requests:
cpu: 100m
memory: 100Mi
Environment:
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: gesis
Mounts:
/etc/minesweeper from config (ro)
/srv/minesweeper from src (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5wbfq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
src:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: minesweeper-src
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: minesweeper-config
Optional: false
kube-api-access-5wbfq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=user:NoSchedule
hub.jupyter.org_dedicated=user:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 4m51s (x545 over 129m) kubelet Back-off pulling image "jupyterhub/mybinder.org-minesweeper:set-by-chartpress"
Can you try running an ephemeral pod in the same namespace, and exec something like curl -v http://binderhub-ingress-nginx-controller/ from the pod? That should return a 404 from the Nginx controller default backend. You might need to add the internal service port. Note the existing pods may be restricted by NetworkPolicies, so best to create a new pod. I often use https://gist.github.com/manics/67efaed42d25cc1f830e0d5566652b03 as netshoot includes several useful tools for troubleshooting networks.
Then try curl -v --header 'Host: notebooks-test.gesis.org' http://binderhub-ingress-nginx-controller/ from the pod which should fool the ingress controller into thinking you've requested notebooks-test.gesis.org.
If that works it means the controller and your internal BinderHub/JupyterHub ingress is (probably!) working, and the problem is likely in the path between the external internet and the internal ingress.
For the chartpress tag problem you'll need to first run chartpress --skip-build to update the set-by-chartpress placeholders:
https://github.com/jupyterhub/mybinder.org-deploy/blob/19406517c562d779999f98e70a2f33eaa662dde1/.github/workflows/cd.yml#L322-L324
The actual building and pushing of the container images is done in the staging workflow, and since chartpress deterministically generates the tag based on git commit hash it's fine to rerun it to update the tags.
Thanks @manics for the reply. I will look into chartpress. And I believe the problem with traffic is because of the load balancer. I looking at MetalLB.
@manics can I have a bit of help with the pre-commit CI? Anything that I could do for it to reformat the code automatically?
@rgaiacs You can run pre-commit run -a locally and commit/push the result. I think prettier specifically doesn't write in CI for reasons that are documented somewhere but I will need to find the link.
- https://github.com/pre-commit/pre-commit/issues/532
- https://github.com/pre-commit/pre-commit/issues/747
- https://github.com/pre-commit/pre-commit/issues/806
- https://github.com/pre-commit/pre-commit/issues/879
I'm closing this as after some discussion with @arnim, will be better for us at GESIS to handle the Kubernetes deployment to our bare-metal server on a separate Git repository.
Thanks for all the help!