zero-to-jupyterhub-k8s Automatic secret generation triggers constant redeploy (ArgoCD)

trafficstars

Scenario

Deploying zero-to-jupyterhub-k8s via ArgoCD

ArgoCD checks now and then on changes in the code base. If this code base is Helm ArgoCD will use the helm template command.

Because the tokens are automatically generated. For example: schema-proxy-secrettoken ArgoCD constantly thinks there is a new set of code (tokens) and will try to apply the latest state what in turn will result in a new deployment. This process repeats constantly.

Now the docs note that I can a default token via override file. But we don't want that because this will mean the secret is in plain text in the code base. We'd like to apply the secret for example via the existingsecret method.

Proposed change

Be able to set ALL secrets via existingsecret

Who would use this feature?

Everyone who works from a code base and don't want to commit secrets.

Sep 21 '22 09:09 BlueCog

This is a great coincidence... Just spent the morning figuring out why my hub kept restarting. Would be nice to have a solution for this.

For reference, using version 1.20 of the chart.

Sep 21 '22 15:09 kanor1306

hub.existingSecret is a reference to a self managed k8s Secret with configuration.

This works for most things, because in the hub pod we can optionally mounts the optional k8s Secret declared in hub.existingSecret and the Helm chart managed k8s Secret and then use Python to merge these. But, specifically with regards to the configuration proxy.secretToken, this is tricky. It is needed to be known by both the hub pod and the proxy pod - its the key they share between them to let the hub control the proxy via a REST API - it must be known by both.

So, we would need similar logic of merging configuration also in the proxy pod as we have in the hub pod if this is to be supported. Currently, the proxy pod reads the k8s Secret managed by the Helm chart like this - and doesn't respect hub.existingSecret.

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/proxy/deployment.yaml#L116-L124

Proposals on how this could be resolved are welcome, but if the complexity increase too much I'll be very inclined to reject such proposal to ensure sustainable maintenance of this Helm chart long term.

Related background knowledge

The `lookup` function

The helm template lookup function is used to inspect the state of the current installed Helm chart in a k8s cluster.

This can be trouble if:

We would try to lookup content of some k8s Secret currently installed in a k8s cluster not managed by the Helm chart (as decided via labels etc I think)
We would try to lookup content of some k8s Secret currently installed in a k8s cluster but using helm template instead of helm install --dry-run --- helm template is not connecting to a k8s api-server and can't use the lookup function against one.

The way we generate credentials

Here we generate credentials and stash them into k8s Secret key/value pairs:

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/hub/secret.yaml#L28-L41

Here is the helm logic referenced to generate the key, using the lookup function:

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/hub/_helpers-passwords.tpl#L34-L47

Sep 22 '22 11:09 consideRatio

I see... and of course ArgoCD uses helm template so it does not support lookup. Thanks for the explanation @consideRatio

Sep 22 '22 13:09 kanor1306

I was looking a bit into Jupyterhub documentation because I ran into the same issue. So far I just disabled the Auto Sync so I can just sync whenever I want and not as soon as ArgoCD picks up any change on the repository.

I was thinking that I could actually generate the needed tokens myself (proxy auth mainly, but also the cookie secret) and spin up a k8s Secret that would be then used in EnvVars both for the Hub and the Proxy.

Thoughts on that idea? I'm not sure if it would work just passing "blank" configuration values to avoid them being generated and instantiate mines separately, passing them as EnvVars.

Thanks folks!

Sep 22 '22 13:09 IceS2

@IceS2 according to the Kubernetes documentation of envFrom:

List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.

So, if you pass the variable as an env then yours will take precedence... but I guess that in that case you are in the same case as putting in plain text in proxy.secretToken. Not sure what will happen with two envFrom. In the chart the extraEnv is indeed after the one that contains the token, but I am not sure if this will really prevent the restarts, as the secret changes will keep triggering restarts.

I have tried a different approach, by using the ignoreDifferences combined with the syncOptions RespectIgnoreDifferences=true in ArgoCD, and succeeded preventing the restarts. But it does not really work because now proxy and the hub only get applied the new manifest if they have changes to themselves (i.e. if you modify a value of hub, the hub restart but not the proxy, and vice-versa), which makes them having different secrets thus causing authentication problems.

I think that this can possible be overcame by using some label/annotation that you should update every time you change a value in any part of the chart, but I haven't tried to implement it, as I have decided for a simpler mitigation approach.

At the end I have opted for using SyncWindows, so the Jupyter chart does not get synchronized during office hours.

P.S. This is our Jupyter app config in Argo in case you want to give it a go:

project: default
<other_stuff>
syncPolicy:
  <other_stuff>
  syncOptions:
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
      - /spec/template/metadata/annotations/checksum~1secret
      - /spec/template/metadata/annotations/checksum~1auth-token
  - kind: Secret
    name: hub
    jsonPointers:
      - /data/hub~0config~0ConfigurableHTTPProxy~0auth_token
      - /data/hub~0config~0CryptKeeper~0keys
      - /data/hub~0config~0JupyterHub~0cookie_secret

Sep 23 '22 16:09 kanor1306

Why would the secret keep changing if I pass a dummy value for it to get it from instead of being generated every time?

I think it's actually different from placing it plain text in proxy.secretToken since I'll creat my own k8s secret to store it and mount them as envFrom 🤔 no?

(I'm not a k8s expert so I might be saying stupid things xD)

Sep 23 '22 16:09 IceS2

@IceS2 if the envFrom approach works, it is definitely way better than setting proxy.secretToken. I meant that it was the same if you used env.

About the restarts, actually if you set the proxy.secretToken to a dummy value the secret will not be autogenerated, so maybe will work. Just give it a try to the combination envFrom and dummy value for the proxy.secretToken!

Sep 23 '22 17:09 kanor1306

Great to see the activity in this repo!

Thanks @consideRatio for the in depth reply! Makes sense. And @kanor1306 i'll try your proposal

Sep 26 '22 11:09 BlueCog

Hi @kanor1306

We also implemented the ignoreDifferences option (without RespectIgnoreDifferences=true). But we are not sure about your statement:

But it does not really work because now proxy and the hub only get applied the new manifest if they have changes to themselves (i.e. if you modify a value of hub, the hub restart but not the proxy, and vice-versa), which makes them having different secrets thus causing authentication problems.

We did a change to a definition to the hub. This triggers a redeploy of the hub (not the proxy). But we are still able to login / spawn Notebooks etc. Is there something we are missing to your specific situation? We are still on the 1.2.0 version of the Helm Chart by the way...

Our ignoreDifferences setting in our ArgoCD application definition:

  ignoreDifferences:
    - name: hub
      kind: Secret
      jsonPointers:
        - /data/hub.config.ConfigurableHTTPProxy.auth_token
        - /data/hub.config.CryptKeeper.keys
        - /data/hub.config.JupyterHub.cookie_secret
    - name: hub
      kind: Deployment
      group: apps
      jsonPointers:
        - /spec/template/metadata/annotations/checksum~1secret
    - name: proxy
      kind: Deployment
      group: apps
      jsonPointers:
        - /spec/template/metadata/annotations/checksum~1auth-token

Sep 26 '22 13:09 BlueCog

@BlueCog, also 1.2.0 here. Using RespectIgnoreDifferences=true is really a different use case, as it changes how Argo decides to synchronize or not. This the what happens in my case when something is changed in the values.yaml of the Jupyter chart:

Modify something in the top level values.yaml, within the Jupyter chart section (if you modify the values of any other app then this is not a problem)
This change triggers Argocd Sync
The hub secret contains the values.yaml as one of the key pairs. As they have changed, that triggers the secret to be regenerated and re-applied. As we have the ignoreDifferences checksums, it does not trigger a restart of the hub or the proxy
What does trigger a restart in the hub is whatever modification we have made in the values.yaml(assuming that it changes the hub deployment template in a any way).
Conclusion, hub is running with a regenerated proxyToken while proxy is running with the old one. Then, issues with authentication.

But again, this is the case if using RespectIgnoreDifferences=true.

As far as my understanding of Argo goes, If you are not using RespectIgnoreDifferences=true, then the ignoreDifferences does not make much for protecting you from synchronizations due to changes in the repository... This is what I think happens (though I may be missing something here):

Modify something in the top level values.yaml, it does not really matter if it is within the Jupyter section or any other.
This change triggers Argocd Sync.
Argo executes helm template, compares the result with the running version (and does not ignore anything, as RespectIgnoreDifferences=false), and applies all the resources that are different.

I am guessing that we are in a different use case, maybe in the structure of our Argo Application or something similar, and that is why we have different (although similar) problems

Sep 26 '22 15:09 kanor1306

Hey folks, I've just tested successfully what I mentioned here:

We were already using this Helm Chart as part of our own since we needed to add a few other resources. I ended up adding a new secret (fetching the secret from AWS SSM in our case):

{{ if .Values.innerTokens }}
apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
  name: {{ .Values.appName }}-inner-tokens
  namespace: {{ .Values.namespace }}
spec:
  backendType: secretsManager
  region: {{ .Values.region }}
  data:
  - name: proxy_token
    key: {{ (.Values.innerTokens).SSM }}
    property: proxy_token
  - name: cookie_secret
    key: {{ (.Values.innerTokens).SSM }}
    property: cookie_secret
{{ end }}

After that I'm using the following configuration:

  hub:
    extraEnv:
      CONFIGPROXY_AUTH_TOKEN:
        valueFrom:
          secretKeyRef:
            name: jupyterhub-inner-tokens
            key: proxy_token
      JPY_COOKIE_SECRET:
        valueFrom:
          secretKeyRef:
            name: jupyterhub-inner-tokens
            key: cookie_secret
    config:
      # Default Tokens to avoid the proxy service being restarted when the hub is updated
      # https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/2887
      ConfigurableHTTPProxy:
        auth_token: affb56f3f015f1da189aebeb1f9e5731285cbd0443a87426c5368794275ef3c2
      JupyterHub:
        cookie_secret: 8663f67233468eb70667473c5a065c9640bab0ce5ea5bc9292e2f29977cbd0e2
    extraConfig:
      # Small needed piece of code to overwrite the 'cookie_secret'.
      01-set-cookie-secret.py: |
        import os
        c.JupyterHub.cookie_secret = os.getenv('JPY_COOKIE_SECRET', None)

  proxy:
      extraEnv:
        CONFIGPROXY_AUTH_TOKEN:
          valueFrom:
            secretKeyRef:
              name: jupyterhub-inner-tokens
              key: proxy_token

The plaintext tokens on the configuration act as placeholders. The extraEnv are actually overwriting the CONFIG_PROXY_AUTH_TOKEN because it's defined after the one hardcoded on the Helm Chart. Since the Helm Chart doesn't use the JPY_COOKIE_SECRET, I'm overwriting the configuration using extraConfig.

Sep 27 '22 12:09 IceS2

Nice @IceS2 !

Then I think we have three scenario's for my initial post.

create extra secret and setting extraEnv
seting a ignoreDifferences setting in the ArgoCD application
rework on the Helm chart itself

We/my team think scenario 2 is the best for us at te moment. If I understand the documentation correctly ArgoCD will ignore the secret rotation when looking to live and git status. It only will parse the new secrets and annotations when doing a change to the git status. This is fine by us and will follow the best practice of secret rotation and not needed to know (and manage) the secret(s) all together.

A side note is that it seems the ignoreDifferences setting seems to be behaving differently in other scenarios. See posts from me and @kanor1306 about the subject.

Case closed? ;)

Sep 27 '22 13:09 BlueCog

Are there any adjustments needed for the last tag (3.0.0)? doesnt work scenario 2.

tried without RespectIgnoreDifferences=true

spec:
  ignoreDifferences:
    - name: hub
      kind: Secret
      group: apps
      jsonPointers:
        - /data/hub~0config~0ConfigurableHTTPProxy~0auth_token
        - /data/hub~0config~0CryptKeeper~0keys
        - /data/hub~0config~0JupyterHub~0cookie_secret
    - name: hub
      kind: Deployment
      group: apps
      jsonPointers:
        - /spec/template/metadata/annotations/checksum~1secret
    - name: proxy
      kind: Deployment
      group: apps
      jsonPointers:
        - /spec/template/metadata/annotations/checksum~1auth-token

Aug 11 '23 15:08 gulldan

@gulldan the group is wrong on the Secret. Here is the correct ignore block:

ignoreDifferences:
  - kind: Secret
    name: hub
    jsonPointers:
      - /data/hub.config.CryptKeeper.keys
      - /data/hub.config.JupyterHub.cookie_secret
      - /data/hub.config.ConfigurableHTTPProxy.auth_token
  - group: apps
    kind: Deployment
    name: hub
    jsonPointers:
      - /spec/template/metadata/annotations/checksum~1secret
  - group: apps
    kind: Deployment
    name: proxy
    jsonPointers:
      - /spec/template/metadata/annotations/checksum~1auth-token

Nov 23 '23 09:11 mts-dyt

zero-to-jupyterhub-k8s zero-to-jupyterhub-k8s copied to clipboard

Automatic secret generation triggers constant redeploy (ArgoCD)

Scenario

Proposed change

Who would use this feature?

Related background knowledge

The lookup function

The way we generate credentials

zero-to-jupyterhub-k8s
zero-to-jupyterhub-k8s copied to clipboard

The `lookup` function