mybinder.org-deploy icon indicating copy to clipboard operation
mybinder.org-deploy copied to clipboard

Streamline deployment of GESIS stage server

Open rgaiacs opened this issue 1 year ago • 9 comments

This is related to https://github.com/jupyterhub/mybinder.org-deploy/issues/2797

The configuration in the ansible folder is working and GitLab CI at .gitlab-ci.yml is also working.

I'm trying to complete the Kubernetes cluster configuration in the Helm chart.

rgaiacs avatar Sep 06 '24 15:09 rgaiacs

@manics @sgibson91 @minrk could you help me to understand what Helm chart configuration is being loaded by mistake? The binder pod crashes with the following log

Loading /etc/binderhub/config/values.yaml
Loading extra config: 01-eventlog
Loading extra config: 01-template-variables
Loading extra config: 02-badge-base-url
Loading extra config: 02-event-loop-metric
[BinderHub] starting!
[BinderHub] WARNING | BinderHub.build_node_selector is deprecated, use KubernetesBuildExecutor.node_selector
[BinderHub] WARNING | BinderHub.build_docker_host is deprecated, use KubernetesBuildExecutor.docker_host
[W 240906 15:36:29 _metadata:139] Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
[W 240906 15:36:32 _metadata:139] Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
[W 240906 15:36:35 _metadata:139] Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
[W 240906 15:36:35 _default:338] Authentication failed using Compute Engine authentication due to unavailable metadata server.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/site-packages/binderhub/__main__.py", line 4, in <module>
    main()
  File "/usr/local/lib/python3.11/site-packages/traitlets/config/application.py", line 1074, in launch_instance
    app.initialize(argv)
  File "/usr/local/lib/python3.11/site-packages/binderhub/app.py", line 913, in initialize
    self.event_log = EventLog(parent=self)
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/binderhub/events.py", line 51, in __init__
    self.handlers = self.handlers_maker(self)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 18, in _make_eventsink_handler
  File "/usr/local/lib/python3.11/site-packages/google/cloud/logging_v2/client.py", line 122, in __init__
    super(Client, self).__init__(
  File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 320, in __init__
    _ClientProjectMixin.__init__(self, project=project, credentials=credentials)
  File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 268, in __init__
    project = self._determine_default(project)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 287, in _determine_default
    return _determine_default_project(project)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/_helpers/__init__.py", line 152, in _determine_default_project
    _, project = google.auth.default()
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/auth/_default.py", line 691, in default
    raise exceptions.DefaultCredentialsError(_CLOUD_SDK_MISSING_CREDENTIALS)
google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.

GESIS runs the BinderHub server on bare metal.

rgaiacs avatar Sep 06 '24 15:09 rgaiacs

Thanks @manics for the reply and comments. I was able to disable the the attempt to contact Google Cloud with https://github.com/jupyterhub/mybinder.org-deploy/blob/19406517c562d779999f98e70a2f33eaa662dde1/config/curvenote.yaml#L252-L253

The problem that I have is that all persistent volume claims are pending.

kubectl get -n gesis pvc
NAME                                        STATUS    VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
binderhub-grafana                           Pending                                                           24h
binderhub-harbor-jobservice                 Pending                                                           4d16h
binderhub-harbor-registry                   Pending                                                           4d16h
binderhub-prometheus-server                 Pending                                            standard       24h
data-binderhub-harbor-redis-0               Bound     alertmanager   5Gi        RWO                           4d16h
data-binderhub-harbor-trivy-0               Pending                                                           4d16h
database-data-binderhub-harbor-database-0   Pending                                                           4d16h
hub-db-dir                                  Pending                                                           24h

I know that I need to declare a correct persistent volume but I can't find where the persistent volume is declared for OVH or CurveNote. @manics can you point me to the persistent volume declaration? Thanks!

rgaiacs avatar Sep 10 '24 07:09 rgaiacs

I have the main pods running.

kubectl get -n gesis pods
NAME                                                     READY   STATUS             RESTARTS   AGE
binder-7c84c576c-2689p                                   1/1     Running            0          80m
binderhub-cryptnono-c9hrj                                2/2     Running            0          128m
binderhub-cryptnono-dgr4g                                2/2     Running            0          128m
binderhub-cryptnono-hqpzf                                2/2     Running            0          128m
binderhub-cryptnono-pbqlx                                2/2     Running            0          128m
binderhub-dind-ntxvs                                     1/1     Running            0          80m
binderhub-grafana-9d48bc74-qtn4x                         1/1     Running            0          62m
binderhub-image-cleaner-6zc9v                            1/1     Running            0          80m
binderhub-ingress-nginx-controller-6fdbf98688-j29w2      1/1     Running            0          47m
binderhub-ingress-nginx-defaultbackend-5d698c868-qh5zx   1/1     Running            0          128m
binderhub-kube-state-metrics-8547b9d4dd-rr4tw            1/1     Running            0          128m
binderhub-prometheus-node-exporter-4dv2s                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-c8bv7                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-gkxcf                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-wfk7h                 1/1     Running            0          128m
binderhub-prometheus-server-7c59dd5d85-fwbqm             2/2     Running            0          128m
hub-6564cd475f-nxltz                                     1/1     Running            0          13m
minesweeper-bf58z                                        0/1     ImagePullBackOff   0          128m
minesweeper-fkjd6                                        0/1     ImagePullBackOff   0          128m
minesweeper-t2fs8                                        0/1     ImagePullBackOff   0          128m
proxy-f5b566ddc-j7l9l                                    1/1     Running            0          80m
proxy-patches-85b5998bdb-9mjw9                           1/1     Running            0          128m
static-6f64c6bc8-ndn2t                                   1/1     Running            0          128m
user-scheduler-55df956bcf-6b4m6                          1/1     Running            0          80m
user-scheduler-55df956bcf-db79g                          1/1     Running            0          80m

Ingress

The ingress is not working. The goal here is to have http://notebooks-test.gesis.org being answer by the NGINX Ingress pod. @manics can you help me?

ping -c 1 notebooks-test.gesis.org
PING notebooks-test.gesis.org (194.95.75.20) 56(84) bytes of data.
64 bytes from svko-css-backup-node.gesis.intra (194.95.75.20): icmp_seq=1 ttl=61 time=2.26 ms

--- notebooks-test.gesis.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.261/2.261/2.261/0.000 ms
kubectl -n gesis describe ingress binderhub
Name:             binderhub
Labels:           app.kubernetes.io/managed-by=Helm
Namespace:        gesis
Address:          10.100.230.222
Ingress Class:    <none>
Default backend:  <default>
TLS:
  kubelego-tls-binder-binderhub terminates notebooks-test.gesis.org
Rules:
  Host                      Path  Backends
  ----                      ----  --------
  notebooks-test.gesis.org  
                            /   binder:80 (10.244.255.21:8585)
Annotations:                kubernetes.io/ingress.class: nginx
                            kubernetes.io/tls-acme: true
                            meta.helm.sh/release-name: binderhub
                            meta.helm.sh/release-namespace: gesis
Events:
  Type    Reason  Age   From                      Message
  ----    ------  ----  ----                      -------
  Normal  Sync    54m   nginx-ingress-controller  Scheduled for sync
  Normal  Sync    54m   nginx-ingress-controller  Scheduled for sync
  Normal  Sync    53m   nginx-ingress-controller  Scheduled for sync
kubectl -n gesis describe service binderhub-ingress-nginx-controller
Name:              binderhub-ingress-nginx-controller
Namespace:         gesis
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=binderhub
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.11.2
                   helm.sh/chart=ingress-nginx-4.11.2
Annotations:       meta.helm.sh/release-name: binderhub
                   meta.helm.sh/release-namespace: gesis
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=binderhub,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.100.230.222
IPs:               10.100.230.222
Port:              http  80/TCP
TargetPort:        http/TCP
Endpoints:         10.244.65.205:80
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         10.244.65.205:443
Session Affinity:  None
Events:            <none>

minesweeper

The image name is wrong. It is trying to pull jupyterhub/mybinder.org-minesweeper:set-by-chartpress.

kubectl -n gesis describe pod minesweeper-bf58z
Name:             minesweeper-bf58z
Namespace:        gesis
Priority:         0
Service Account:  minesweeper
Node:             svko-css-backup-node/194.95.75.20
Start Time:       Tue, 10 Sep 2024 14:27:52 +0200
Labels:           app=binder
                  component=minesweeper
                  controller-revision-hash=767d8795cc
                  heritage=Helm
                  name=minesweeper
                  pod-template-generation=1
                  release=binderhub
Annotations:      checksum/configmap: 7a857debb16fa8bcb22a5de6418a5ff319c9e06f4cfc010705caec539b9614cc
                  cni.projectcalico.org/containerID: a3415f68c66691989387a7ea9bc5c6dd5cfa8039affee823adfd0a9b8f0b7263
                  cni.projectcalico.org/podIP: 10.244.65.206/32
                  cni.projectcalico.org/podIPs: 10.244.65.206/32
Status:           Pending
IP:               10.244.65.206
IPs:
  IP:           10.244.65.206
Controlled By:  DaemonSet/minesweeper
Containers:
  minesweeper:
    Container ID:  
    Image:         jupyterhub/mybinder.org-minesweeper:set-by-chartpress
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      python
      /srv/minesweeper/minesweeper.py
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  250Mi
    Requests:
      cpu:     100m
      memory:  100Mi
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
      NAMESPACE:  gesis
    Mounts:
      /etc/minesweeper from config (ro)
      /srv/minesweeper from src (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5wbfq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  src:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      minesweeper-src
    Optional:  false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      minesweeper-config
    Optional:  false
  kube-api-access-5wbfq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 hub.jupyter.org/dedicated=user:NoSchedule
                             hub.jupyter.org_dedicated=user:NoSchedule
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason   Age                     From     Message
  ----    ------   ----                    ----     -------
  Normal  BackOff  4m51s (x545 over 129m)  kubelet  Back-off pulling image "jupyterhub/mybinder.org-minesweeper:set-by-chartpress"

rgaiacs avatar Sep 10 '24 14:09 rgaiacs

Can you try running an ephemeral pod in the same namespace, and exec something like curl -v http://binderhub-ingress-nginx-controller/ from the pod? That should return a 404 from the Nginx controller default backend. You might need to add the internal service port. Note the existing pods may be restricted by NetworkPolicies, so best to create a new pod. I often use https://gist.github.com/manics/67efaed42d25cc1f830e0d5566652b03 as netshoot includes several useful tools for troubleshooting networks.

Then try curl -v --header 'Host: notebooks-test.gesis.org' http://binderhub-ingress-nginx-controller/ from the pod which should fool the ingress controller into thinking you've requested notebooks-test.gesis.org.

If that works it means the controller and your internal BinderHub/JupyterHub ingress is (probably!) working, and the problem is likely in the path between the external internet and the internal ingress.

manics avatar Sep 10 '24 16:09 manics

For the chartpress tag problem you'll need to first run chartpress --skip-build to update the set-by-chartpress placeholders: https://github.com/jupyterhub/mybinder.org-deploy/blob/19406517c562d779999f98e70a2f33eaa662dde1/.github/workflows/cd.yml#L322-L324 The actual building and pushing of the container images is done in the staging workflow, and since chartpress deterministically generates the tag based on git commit hash it's fine to rerun it to update the tags.

manics avatar Sep 10 '24 16:09 manics

Thanks @manics for the reply. I will look into chartpress. And I believe the problem with traffic is because of the load balancer. I looking at MetalLB.

rgaiacs avatar Sep 11 '24 15:09 rgaiacs

@manics can I have a bit of help with the pre-commit CI? Anything that I could do for it to reformat the code automatically?

rgaiacs avatar Sep 25 '24 09:09 rgaiacs

@rgaiacs You can run pre-commit run -a locally and commit/push the result. I think prettier specifically doesn't write in CI for reasons that are documented somewhere but I will need to find the link.

sgibson91 avatar Sep 25 '24 09:09 sgibson91

  • https://github.com/pre-commit/pre-commit/issues/532
  • https://github.com/pre-commit/pre-commit/issues/747
  • https://github.com/pre-commit/pre-commit/issues/806
  • https://github.com/pre-commit/pre-commit/issues/879

sgibson91 avatar Sep 25 '24 09:09 sgibson91

I'm closing this as after some discussion with @arnim, will be better for us at GESIS to handle the Kubernetes deployment to our bare-metal server on a separate Git repository.

Thanks for all the help!

rgaiacs avatar Jan 07 '25 17:01 rgaiacs