zero-to-jupyterhub-k8s
zero-to-jupyterhub-k8s copied to clipboard
Automatic secret generation triggers constant redeploy (ArgoCD)
Scenario
Deploying zero-to-jupyterhub-k8s via ArgoCD
ArgoCD checks now and then on changes in the code base. If this code base is Helm ArgoCD will use the helm template
command.
Because the tokens are automatically generated. For example: schema-proxy-secrettoken ArgoCD constantly thinks there is a new set of code (tokens) and will try to apply the latest state what in turn will result in a new deployment. This process repeats constantly.
Now the docs note that I can a default token via override file. But we don't want that because this will mean the secret is in plain text in the code base. We'd like to apply the secret for example via the existingsecret method.
Proposed change
Be able to set ALL secrets via existingsecret
Who would use this feature?
Everyone who works from a code base and don't want to commit secrets.
This is a great coincidence... Just spent the morning figuring out why my hub kept restarting. Would be nice to have a solution for this.
For reference, using version 1.20 of the chart.
hub.existingSecret
is a reference to a self managed k8s Secret with configuration.
This works for most things, because in the hub
pod we can optionally mounts the optional k8s Secret declared in hub.existingSecret
and the Helm chart managed k8s Secret and then use Python to merge these. But, specifically with regards to the configuration proxy.secretToken
, this is tricky. It is needed to be known by both the hub
pod and the proxy
pod - its the key they share between them to let the hub
control the proxy
via a REST API - it must be known by both.
So, we would need similar logic of merging configuration also in the proxy pod as we have in the hub pod if this is to be supported. Currently, the proxy pod reads the k8s Secret managed by the Helm chart like this - and doesn't respect hub.existingSecret
.
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/proxy/deployment.yaml#L116-L124
Proposals on how this could be resolved are welcome, but if the complexity increase too much I'll be very inclined to reject such proposal to ensure sustainable maintenance of this Helm chart long term.
Related background knowledge
The lookup
function
The helm template lookup
function is used to inspect the state of the current installed Helm chart in a k8s cluster.
This can be trouble if:
- We would try to
lookup
content of some k8s Secret currently installed in a k8s cluster not managed by the Helm chart (as decided via labels etc I think) - We would try to
lookup
content of some k8s Secret currently installed in a k8s cluster but usinghelm template
instead ofhelm install --dry-run
---helm template
is not connecting to a k8s api-server and can't use thelookup
function against one.
The way we generate credentials
Here we generate credentials and stash them into k8s Secret key/value pairs:
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/hub/secret.yaml#L28-L41
Here is the helm logic referenced to generate the key, using the lookup
function:
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/dfeb8ea4a610a5c7d91d21b6d3c2d39e20f2d290/jupyterhub/templates/hub/_helpers-passwords.tpl#L34-L47
I see... and of course ArgoCD uses helm template
so it does not support lookup
. Thanks for the explanation @consideRatio
I was looking a bit into Jupyterhub documentation because I ran into the same issue. So far I just disabled the Auto Sync so I can just sync whenever I want and not as soon as ArgoCD picks up any change on the repository.
I was thinking that I could actually generate the needed tokens myself (proxy auth mainly, but also the cookie secret) and spin up a k8s Secret that would be then used in EnvVars both for the Hub and the Proxy.
Thoughts on that idea? I'm not sure if it would work just passing "blank" configuration values to avoid them being generated and instantiate mines separately, passing them as EnvVars.
Thanks folks!
@IceS2 according to the Kubernetes documentation of envFrom
:
List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.
So, if you pass the variable as an env
then yours will take precedence... but I guess that in that case you are in the same case as putting in plain text in proxy.secretToken
. Not sure what will happen with two envFrom
. In the chart the extraEnv
is indeed after the one that contains the token, but I am not sure if this will really prevent the restarts, as the secret changes will keep triggering restarts.
I have tried a different approach, by using the ignoreDifferences
combined with the syncOptions RespectIgnoreDifferences=true
in ArgoCD, and succeeded preventing the restarts. But it does not really work because now proxy
and the hub
only get applied the new manifest if they have changes to themselves (i.e. if you modify a value of hub
, the hub restart but not the proxy, and vice-versa), which makes them having different secrets thus causing authentication problems.
I think that this can possible be overcame by using some label/annotation that you should update every time you change a value in any part of the chart, but I haven't tried to implement it, as I have decided for a simpler mitigation approach.
At the end I have opted for using SyncWindows, so the Jupyter chart does not get synchronized during office hours.
P.S. This is our Jupyter app config in Argo in case you want to give it a go:
project: default
<other_stuff>
syncPolicy:
<other_stuff>
syncOptions:
- RespectIgnoreDifferences=true
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/template/metadata/annotations/checksum~1secret
- /spec/template/metadata/annotations/checksum~1auth-token
- kind: Secret
name: hub
jsonPointers:
- /data/hub~0config~0ConfigurableHTTPProxy~0auth_token
- /data/hub~0config~0CryptKeeper~0keys
- /data/hub~0config~0JupyterHub~0cookie_secret
Why would the secret keep changing if I pass a dummy value for it to get it from instead of being generated every time?
I think it's actually different from placing it plain text in proxy.secretToken
since I'll creat my own k8s secret to store it and mount them as envFrom 🤔 no?
(I'm not a k8s expert so I might be saying stupid things xD)
@IceS2 if the envFrom
approach works, it is definitely way better than setting proxy.secretToken
. I meant that it was the same if you used env
.
About the restarts, actually if you set the proxy.secretToken
to a dummy value the secret will not be autogenerated, so maybe will work. Just give it a try to the combination envFrom
and dummy value for the proxy.secretToken
!
Great to see the activity in this repo!
Thanks @consideRatio for the in depth reply! Makes sense. And @kanor1306 i'll try your proposal
Hi @kanor1306
We also implemented the ignoreDifferences
option (without RespectIgnoreDifferences=true
). But we are not sure about your statement:
But it does not really work because now proxy and the hub only get applied the new manifest if they have changes to themselves (i.e. if you modify a value of hub, the hub restart but not the proxy, and vice-versa), which makes them having different secrets thus causing authentication problems.
We did a change to a definition to the hub. This triggers a redeploy of the hub (not the proxy). But we are still able to login / spawn Notebooks etc. Is there something we are missing to your specific situation? We are still on the 1.2.0
version of the Helm Chart by the way...
Our ignoreDifferences
setting in our ArgoCD application definition:
ignoreDifferences:
- name: hub
kind: Secret
jsonPointers:
- /data/hub.config.ConfigurableHTTPProxy.auth_token
- /data/hub.config.CryptKeeper.keys
- /data/hub.config.JupyterHub.cookie_secret
- name: hub
kind: Deployment
group: apps
jsonPointers:
- /spec/template/metadata/annotations/checksum~1secret
- name: proxy
kind: Deployment
group: apps
jsonPointers:
- /spec/template/metadata/annotations/checksum~1auth-token
@BlueCog, also 1.2.0
here. Using RespectIgnoreDifferences=true
is really a different use case, as it changes how Argo decides to synchronize or not. This the what happens in my case when something is changed in the values.yaml
of the Jupyter chart:
- Modify something in the top level
values.yaml
, within the Jupyter chart section (if you modify the values of any other app then this is not a problem) - This change triggers Argocd Sync
- The
hub
secret contains thevalues.yaml
as one of the key pairs. As they have changed, that triggers the secret to be regenerated and re-applied. As we have theignoreDifferences
checksums, it does not trigger a restart of thehub
or theproxy
- What does trigger a restart in the hub is whatever modification we have made in the
values.yaml
(assuming that it changes the hub deployment template in a any way). - Conclusion,
hub
is running with a regeneratedproxyToken
whileproxy
is running with the old one. Then, issues with authentication.
But again, this is the case if using RespectIgnoreDifferences=true
.
As far as my understanding of Argo goes, If you are not using RespectIgnoreDifferences=true
, then the ignoreDifferences
does not make much for protecting you from synchronizations due to changes in the repository... This is what I think happens (though I may be missing something here):
- Modify something in the top level
values.yaml
, it does not really matter if it is within the Jupyter section or any other. - This change triggers Argocd Sync.
- Argo executes
helm template
, compares the result with the running version (and does not ignore anything, asRespectIgnoreDifferences=false
), and applies all the resources that are different.
I am guessing that we are in a different use case, maybe in the structure of our Argo Application or something similar, and that is why we have different (although similar) problems
Hey folks, I've just tested successfully what I mentioned here:
We were already using this Helm Chart as part of our own since we needed to add a few other resources. I ended up adding a new secret (fetching the secret from AWS SSM in our case):
{{ if .Values.innerTokens }}
apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: {{ .Values.appName }}-inner-tokens
namespace: {{ .Values.namespace }}
spec:
backendType: secretsManager
region: {{ .Values.region }}
data:
- name: proxy_token
key: {{ (.Values.innerTokens).SSM }}
property: proxy_token
- name: cookie_secret
key: {{ (.Values.innerTokens).SSM }}
property: cookie_secret
{{ end }}
After that I'm using the following configuration:
hub:
extraEnv:
CONFIGPROXY_AUTH_TOKEN:
valueFrom:
secretKeyRef:
name: jupyterhub-inner-tokens
key: proxy_token
JPY_COOKIE_SECRET:
valueFrom:
secretKeyRef:
name: jupyterhub-inner-tokens
key: cookie_secret
config:
# Default Tokens to avoid the proxy service being restarted when the hub is updated
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/2887
ConfigurableHTTPProxy:
auth_token: affb56f3f015f1da189aebeb1f9e5731285cbd0443a87426c5368794275ef3c2
JupyterHub:
cookie_secret: 8663f67233468eb70667473c5a065c9640bab0ce5ea5bc9292e2f29977cbd0e2
extraConfig:
# Small needed piece of code to overwrite the 'cookie_secret'.
01-set-cookie-secret.py: |
import os
c.JupyterHub.cookie_secret = os.getenv('JPY_COOKIE_SECRET', None)
proxy:
extraEnv:
CONFIGPROXY_AUTH_TOKEN:
valueFrom:
secretKeyRef:
name: jupyterhub-inner-tokens
key: proxy_token
The plaintext tokens on the configuration act as placeholders.
The extraEnv are actually overwriting the CONFIG_PROXY_AUTH_TOKEN because it's defined after the one hardcoded on the Helm Chart.
Since the Helm Chart doesn't use the JPY_COOKIE_SECRET
, I'm overwriting the configuration using extraConfig.
Nice @IceS2 !
Then I think we have three scenario's for my initial post.
- create extra secret and setting
extraEnv
- seting a
ignoreDifferences
setting in the ArgoCD application - rework on the Helm chart itself
We/my team think scenario 2 is the best for us at te moment. If I understand the documentation correctly ArgoCD will ignore the secret rotation when looking to live
and git
status. It only will parse the new secrets and annotations when doing a change to the git
status. This is fine by us and will follow the best practice of secret rotation and not needed to know (and manage) the secret(s) all together.
A side note is that it seems the ignoreDifferences
setting seems to be behaving differently in other scenarios. See posts from me and @kanor1306 about the subject.
Case closed? ;)
Are there any adjustments needed for the last tag (3.0.0)? doesnt work scenario 2.
tried without RespectIgnoreDifferences=true
spec:
ignoreDifferences:
- name: hub
kind: Secret
group: apps
jsonPointers:
- /data/hub~0config~0ConfigurableHTTPProxy~0auth_token
- /data/hub~0config~0CryptKeeper~0keys
- /data/hub~0config~0JupyterHub~0cookie_secret
- name: hub
kind: Deployment
group: apps
jsonPointers:
- /spec/template/metadata/annotations/checksum~1secret
- name: proxy
kind: Deployment
group: apps
jsonPointers:
- /spec/template/metadata/annotations/checksum~1auth-token
@gulldan the group is wrong on the Secret. Here is the correct ignore block:
ignoreDifferences:
- kind: Secret
name: hub
jsonPointers:
- /data/hub.config.CryptKeeper.keys
- /data/hub.config.JupyterHub.cookie_secret
- /data/hub.config.ConfigurableHTTPProxy.auth_token
- group: apps
kind: Deployment
name: hub
jsonPointers:
- /spec/template/metadata/annotations/checksum~1secret
- group: apps
kind: Deployment
name: proxy
jsonPointers:
- /spec/template/metadata/annotations/checksum~1auth-token