zero-to-jupyterhub-k8s icon indicating copy to clipboard operation
zero-to-jupyterhub-k8s copied to clipboard

403 Forbidden XSRF cookie does not match POST argument after updating to the latest helm chart version (3.3.7)

Open matanshk opened this issue 1 year ago • 17 comments
trafficstars

Bug description

We are using z2jh helm chart on our Kubernetes cluster, we upgraded the chart to the latest version (3.3.7) from 3.1.0. When the upgrade was finished; we started to get the error in the UI: "403 Forbidden, XSRF cookie does not match POST argument" We noticed weird behavior from the jupyterhub, because some people in the team always got the issue, some of them were facing it sometimes (not always), and some didn't face it at all. I want to mention that it happens only with Chrome and Firefox browsers, but with Safari it worked well. Cleaning cookies and incognito didn't solve it, we also tried to update the browser's version to the newest and nothing changed.

Screenshot 2024-05-22 at 4 18 26 PM

I want to mention that before the upgrade we never saw this issue, I tried to downgrade the helm chart version for the previous patches (3.3.6, 3.3.5, 3.3.4, 3.3.3) and still got the same 403 error when I downgraded it to 3.1.0 (our previous version before the upgrade) the issue disappears.

In the logs I can see that:

How to reproduce

Acutely, we tried our best to understand how to reproduce the issue and make it cause in the team members that are not facing with the issue, but without any success :| but I can say that it happens in the authentication step, it's doesn't matter if you provide correct username and password or wrong, you will get the 403 error.

Expected behaviour

To get a smooth authentication process without getting the 403 Forbidden error

Actual behaviour

We are getting 403 error right after clicking on the "Sign in" button

Your personal set up

We are running on LKE cluster with Debian 11 OS worker nodes. Nginx ingress controller and mTLS certificate for authentication on the ingress (I disabled the mTLS for testing and nothing changed) together with dummy authenticator with preconfigured password The issue happens right after the upgrade to helm chart version 3.3.7 from 3.1.0.

Configuration
singleuser:
  events: false
  networkPolicy:
    enabled: false
  storage:
    type: dynamic
    extraLabels: {}
    extraVolumes:
      - name: sparkmagic-config
        configMap:
          name: sparkmagic-config
    extraVolumeMounts:
      - name: sparkmagic-config
        mountPath: /opt/.sparkmagic/config.json
        subPath: config.json
    static:
      pvcName:
      subPath: "{username}"
    capacity: 10Gi
    homeMountPath: /home/jovyan
    dynamic:
      storageClass:
      pvcNameTemplate: claim-{username}{servername}
      volumeNameTemplate: volume-{username}{servername}
      storageAccessModes: [ReadWriteOnce]
  extraEnv:
    SPARKMAGIC_CONF_DIR: /opt/.sparkmagic/
    SPARKMAGIC_CONF_FILE: config.json

  image:
    name: <our_custom_jupytarlab_image>
    tag: <our_custom_jupytarlab_image_tag>
    pullPolicy: Always
    pullSecrets: [ "acr-docker-auth" ]
  startTimeout: 300
  cmd: "/opt/entrypoint.sh"

proxy:
  service:
    type: ClusterIP
  chp:
    networkPolicy:
      enabled: false


hub:
  existingSecret: jupyterhub-secret-conf
  networkPolicy:
    enabled: false
  config:
    Authenticator:
      admin_users:
        - user1
        - user2
        - user3
      allowed_users:
        - user4
        - user5
    JupyterHub:
      authenticator_class: dummy

  authenticatePrometheus: true

  extraEnv:
    - name: PROMETHEUS_TOKEN
      valueFrom:
        secretKeyRef:
          name: prometheus-service-token
          key: PROMETHEUS_TOKEN

  extraConfig: 
    prometheus-service.py: |
      # Add a service "promehteus-service" to scrape prometheus metrics
      c.JupyterHub.services = [
          {
              "name": "prometheus-service",
              "api_token": os.environ["PROMETHEUS_TOKEN"]
          },
      ]

      # Add a service role to scrape prometheus metrics
      c.JupyterHub.load_roles = [
          {
              "name": "service-metrics-role",
              "description": "access metrics",
              "scopes": [
                  "read:metrics",
              ],
              "services": [
                  "prometheus-service",
              ],
          }
      ]

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: letsencrypt-production
    nginx.ingress.kubernetes.io/auth-tls-error-page: "http://www.mysite.com/error-cert.html"
    nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
    nginx.ingress.kubernetes.io/auth-tls-secret: "jupyterhub/ca-secret"
    nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
    nginx.ingress.kubernetes.io/auth-tls-verify-depth: "2"
  ingressClassName:
  pathSuffix:
  pathType: Prefix

  hosts:
    - jupyterhub.example.host.net
  tls:
    - hosts:
        - jupyterhub.example.host.net
      secretName: jupyterhub-production-tls

Logs
[D 2024-05-22 10:37:37.991 JupyterHub _xsrf_utils:155] xsrf id mismatch b'None:K_exHeY0CyJABPsBIDe7n6UIv1_upqmXywnhbOr9FIQ=' != b'None:TC8vH45MqUauWHsXz0zEsrVDFQ-Hzg0Zv3mZzYFnjls='
[I 2024-05-22 10:37:37.992 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'None:TC8vH45MqUauWHsXz0zEsrVDFQ-Hzg0Zv3mZzYFnjls=' {'path': '/hub/', 'max_age': 3600}
[W 2024-05-22 10:37:37.992 JupyterHub web:1873] 403 POST /hub/login?next=%2Fhub%2F (10.2.13.129): XSRF cookie does not match POST argument

matanshk avatar May 23 '24 12:05 matanshk