[BUG] - Forwardauth pod not restarted when custom cert is updated
Describe the bug
Create a nebari deployment with a custom cert. Update the cert, attempt to access some service behind forward auth such as dask cluster dashboard, mlflow, etc. You won't be able to b/c we are using a workaround at the moment and forward auth must be restarted to use the new secret.
Expected behavior
Should be able to access dask dashboard despite having updated tls secret
OS and architecture in which you are running Nebari
Linux x86_64
How to Reproduce the problem?
See above
Command output
If you look at forward auth logs you will see output similar to the following after recreating this error.
time="2024-05-21T19:46:31Z" level=debug msg="Handling callback" cookies="[_forward_auth_csrf_130697=1306972b74fa9922d3ff1c6c62323255 _forward_auth_csrf_8f2e7b=8f2e7bb55fd59dd9dd952dbb97664bdb]" handler=AuthCallback host=mydomain.com method=GET proto=https rule=default source_ip=10.0.0.11 uri="/_oauth?state=8f2e7bb55fd59dd9dd952dbb97664bdb%3Ageneric-oauth%3Ahttps%3A%2F%2Fmydomain.com%2Fmlflow%2F&session_state=23b057fc-c67b-43bd-b96b-425fb403936b&code=a5de370b-3e06-4942-af62-17c663cf587b.23b057fc-c67b-43bd-b96b-425fb403936b.a0bb1c61-a247-47b6-8b6a-6fb10b67ec46"
time="2024-05-21T19:46:32Z" level=error msg="Code exchange failed with provider" error="Post https://mydomain.com/auth/realms/nebari/protocol/openid-connect/token: x509: certificate signed by unknown authority" handler=AuthCallback host=mydomain.com method=GET proto=https rule=default source_ip=10.0.0.11 uri="/_oauth?state=8f2e7bb55fd59dd9dd952dbb97664bdb%3A generic-oauth%3Ahttps%3A%2F%2Fmydomain.com%2Fmlflow%2F&session_state=23b057fc-c67b-43bd-b96b-425fb403936b&code=a5de370b-3e06-4942-af62-17c663cf587b.23b057fc-c67b-43bd-b96b-425fb403936b.a0bb1c61-a247-47b6-8b6a-6fb10b67ec46" ```
Versions and dependencies used.
2024.10.1
Compute environment
Azure
Integrations
No response
Anything else?
No response
In some circumstances, this can also cause ForwardAuth to fail to properly deploy in an initial Nebari deployment. In particular, when self-signed, custom certificates are set in nebari-config.yaml, but not yet installed in the EKS cluster.
After upgrading to release 2025.10.1, I had this issue and it took a bit of digging around in the logs to identify. My set up:
Cloud provider: AWS
Nebari Config settings: certificate.type = existing
DNS provider: non-CloudFlare (Route53)
To get my deployment to run through from start to finish, I found it necessary to install the self-signed cert secret in the EKS cluster manually at the DNS update step in https://github.com/nebari-dev/nebari/blob/main/src/_nebari/stages/kubernetes_ingress/init.py#L27, by using this setting in nebari-config.yaml:
config.dns.auto-provision = False
to cause the DNS update prompt here to appear during deployment.
Otherwise, ForwardAuth will fail to deploy due to not being able to pull the secret from k8s. The relevant error message from EKS:
MountVolume.SetUp failed for volume "cert-volume" : secret <secretname>not found.
It took me a minute to track down that error since it didn't show up in my k9s UI, only in pod logs in the EKS console. This probably should be a separate bug issue, but since the fix is probably going to be the same for both, posting here instead.