nebari icon indicating copy to clipboard operation
nebari copied to clipboard

[BUG] - Forwardauth pod not restarted when custom cert is updated

Open Adam-D-Lewis opened this issue 1 year ago • 1 comments

Describe the bug

Create a nebari deployment with a custom cert. Update the cert, attempt to access some service behind forward auth such as dask cluster dashboard, mlflow, etc. You won't be able to b/c we are using a workaround at the moment and forward auth must be restarted to use the new secret.

Expected behavior

Should be able to access dask dashboard despite having updated tls secret

OS and architecture in which you are running Nebari

Linux x86_64

How to Reproduce the problem?

See above

Command output

If you look at forward auth logs you will see output similar to the following after recreating this error.

time="2024-05-21T19:46:31Z" level=debug msg="Handling callback" cookies="[_forward_auth_csrf_130697=1306972b74fa9922d3ff1c6c62323255 _forward_auth_csrf_8f2e7b=8f2e7bb55fd59dd9dd952dbb97664bdb]" handler=AuthCallback host=mydomain.com method=GET proto=https rule=default source_ip=10.0.0.11 uri="/_oauth?state=8f2e7bb55fd59dd9dd952dbb97664bdb%3Ageneric-oauth%3Ahttps%3A%2F%2Fmydomain.com%2Fmlflow%2F&session_state=23b057fc-c67b-43bd-b96b-425fb403936b&code=a5de370b-3e06-4942-af62-17c663cf587b.23b057fc-c67b-43bd-b96b-425fb403936b.a0bb1c61-a247-47b6-8b6a-6fb10b67ec46"                                                                                                                                                                       

time="2024-05-21T19:46:32Z" level=error msg="Code exchange failed with provider" error="Post https://mydomain.com/auth/realms/nebari/protocol/openid-connect/token: x509: certificate signed by unknown authority" handler=AuthCallback host=mydomain.com method=GET proto=https rule=default source_ip=10.0.0.11 uri="/_oauth?state=8f2e7bb55fd59dd9dd952dbb97664bdb%3A generic-oauth%3Ahttps%3A%2F%2Fmydomain.com%2Fmlflow%2F&session_state=23b057fc-c67b-43bd-b96b-425fb403936b&code=a5de370b-3e06-4942-af62-17c663cf587b.23b057fc-c67b-43bd-b96b-425fb403936b.a0bb1c61-a247-47b6-8b6a-6fb10b67ec46"                                                                                                                                                ```

Versions and dependencies used.

2024.10.1

Compute environment

Azure

Integrations

No response

Anything else?

No response

Adam-D-Lewis avatar Oct 14 '24 18:10 Adam-D-Lewis

In some circumstances, this can also cause ForwardAuth to fail to properly deploy in an initial Nebari deployment. In particular, when self-signed, custom certificates are set in nebari-config.yaml, but not yet installed in the EKS cluster.

After upgrading to release 2025.10.1, I had this issue and it took a bit of digging around in the logs to identify. My set up:

Cloud provider: AWS Nebari Config settings: certificate.type = existing DNS provider: non-CloudFlare (Route53)

To get my deployment to run through from start to finish, I found it necessary to install the self-signed cert secret in the EKS cluster manually at the DNS update step in https://github.com/nebari-dev/nebari/blob/main/src/_nebari/stages/kubernetes_ingress/init.py#L27, by using this setting in nebari-config.yaml:

config.dns.auto-provision = False

to cause the DNS update prompt here to appear during deployment.

Otherwise, ForwardAuth will fail to deploy due to not being able to pull the secret from k8s. The relevant error message from EKS:

MountVolume.SetUp failed for volume "cert-volume" : secret <secretname>not found.

It took me a minute to track down that error since it didn't show up in my k9s UI, only in pod logs in the EKS console. This probably should be a separate bug issue, but since the fix is probably going to be the same for both, posting here instead.

mwengren avatar Nov 24 '25 19:11 mwengren