langfuse icon indicating copy to clipboard operation
langfuse copied to clipboard

bug: OAuth Custom Provider fails because next-auth does not follow redirects

Open ahrakos opened this issue 1 year ago • 8 comments

Describe the bug

When you use a custom provider via OAuth (Specifically Jumpcloud) - you get the SIGNIN_OAUTH_ERROR even though all the details are correct. I was expecting the login to succeed since I mentioned all the relevant details including the issuer url, client_id, client_secret and other relevant data as stated in the docs.

To reproduce

Try to connect Jumpcloud as a custom OAuth provider.

I debugged the code a bit, and understood the issue - langfuse uses next-auth v4 for managing OIDC, which uses [email protected] to do so. This particular library does not follow redirects and that's the actual bug.

What it actually means? Say my issuer is https://oauth.id.jumpcloud.com - next-auth with the help of openid-client tries to resolve the .well-known/openid-configuration endpoint, via a GET request. If you try to "curl" it using curl -I https://oauth.id.jumpcloud.com/.well-known/openid-configuration you'll see that you get a 302 status code back and the json content for the well-known api. Normally it would be ok, you can definitely work with that structure, but if you'll take a look at this code snippet in the lib of [email protected], you'll see that the code checks for the status code to be ONLY 200, although 302/301,201,204 might be valid as well, and otherwise it throws an error.

SDK and container versions

I am using langfuse v3.10 image, it uses next-auth v4.24.11 which uses [email protected]

Additional information

The resolution would be either to

  • Try and upgrade to next-auth v5 (maybe they use the next major version of the openid-client library which is totally different)
  • Try and install the openid-client v6 as a peer dependency and let next-auth use it instead of the older version

Are you interested to contribute a fix for this bug?

Yes

ahrakos avatar Jan 20 '25 16:01 ahrakos

Thanks for sharing, I was not aware of this issue.

I think upgrading to openid-client v6 while nextauth v4 does not support it officially is risky here as all sorts of different authentication providers are used by teams running Langfuse and we cannot potentially test whether this actually works for everyone.

Upgrading to nextauth v5 while it is not yet stable is risky as well.

Another option could be to try to bump nextauth v4 to openid-client v6, I am not sure how much effort this would be as I haven't contributed to nextauth yet.

I think the most reasonable next step here is to upgrade to nextauth v5 once it is available and potentially maintain a fork that first tries whether nextauth v5 beta fixes this, and if this does not work upgrades to openid-client v6. We can then merge these changes once either of the two options are stable on nextauth.

What do you think?

marcklingen avatar Jan 20 '25 18:01 marcklingen

@marcklingen

Quick update:

We overcame this issue via a workaround of creating a double-matching proxy servers of nginx, one as a sidecar of langfuse-web pod, and the other is a detached one. We then changed the hostAliases of the langfuse-web pod to send all traffic for the "oauth.id.jumpcloud.com" to the sidecar, and the proxy would rewrite the http status code from 301/302 to 200, and it worked.

The only thing which is missing is a way to override the hostAliases in langfuse-k8s charts. First, it would be nice to have an available option to do so via values.yaml file

For the fix itself - I do think that it worths trying to upgrade to next-auth v5 on a side-branch and try to see if this is even solving the issue, or another intervention within other OSS libs is needed. If not, it's not a langfuse issue, but worths talking to them from langfuse as you would like to have better coverage over custom providers.

If it does solve the issue, then we can either wait for v5 stable release, or create the forked v4 with openid-client v6.

ahrakos avatar Jan 21 '25 09:01 ahrakos

We overcame this issue via a workaround of creating a double-matching proxy servers of nginx, one as a sidecar of langfuse-web pod, and the other is a detached one. We then changed the hostAliases of the langfuse-web pod to send all traffic for the "oauth.id.jumpcloud.com" to the sidecar, and the proxy would rewrite the http status code from 301/302 to 200, and it worked.

Thanks for sharing that this worked for you!

The only thing which is missing is a way to override the hostAliases in langfuse-k8s charts. First, it would be nice to have an available option to do so via values.yaml file

Can you create an issue or PR on the langfuse-k8s repo?

For the fix itself - I do think that it worths trying to upgrade to next-auth v5 on a side-branch and try to see if this is even solving the issue, or another intervention within other OSS libs is needed. If not, it's not a langfuse issue, but worths talking to them from langfuse as you would like to have better coverage over custom providers.

If it does solve the issue, then we can either wait for v5 stable release, or create the forked v4 with openid-client v6.

If you get to this, I'd be interested to know whether this resolved the issue for you. As the upgrade has multiple breaking changes we probably do not have bandwidth to maintain a branch that uses next-auth v5 while it is not stable yet, would love to push this once it is stable as support for redirects seem to be really important in some setups.

marcklingen avatar Jan 21 '25 09:01 marcklingen

If you get to this, I'd be interested to know whether this resolved the issue for you. As the upgrade has multiple breaking changes we probably do not have bandwidth to maintain a branch that uses next-auth v5 while it is not stable yet, would love to push this once it is stable as support for redirects seem to be really important in some setups.

So what is your suggestion? Should I invest some time in upgrading to next-auth v5 in a side-branch, and see if it works for me?

ahrakos avatar Jan 22 '25 07:01 ahrakos

Should I invest some time in upgrading to next-auth v5 in a side-branch, and see if it works for me?

I think this'd be great! We can then use this branch when v5 is stable to make the upgrade on main

marcklingen avatar Jan 22 '25 10:01 marcklingen

Hey @ahrakos , Do I understand correctly that this is non-blocking for you as of now since you found a workaround using the hostAliases?

We could also consider patching the openid-client library in case it's a straightforward and small change (like extending the acceptable statuscodes). We did that in https://github.com/langfuse/langfuse/pull/5198. Do you think patching the snippet you've shared to accept a broader variety of success codes may work?

Steffen911 avatar Jan 27 '25 09:01 Steffen911

Hey @ahrakos , Do I understand correctly that this is non-blocking for you as of now since you found a workaround using the hostAliases?

We could also consider patching the openid-client library in case it's a straightforward and small change (like extending the acceptable statuscodes). We did that in https://github.com/langfuse/langfuse/pull/5198. Do you think patching the snippet you've shared to accept a broader variety of success codes may work?

@marcklingen My workaround was creating two different proxy servers in order to rewrite the status code to something that next-auth will accept.

It's not a very convenient one, so yes, I do believe that patching the openid-client lib in the particular snippet that I shared would solve the issue and will be very helpful, and reduce the need for over-complicating the configuration of langfuse on our k8s cluster + taking more resources than needed for the proxies workaround.

In any case, it will be good trying to upgrade next-auth in a parallel effort, but starting by patching the lib would be very beneficial for us.

ahrakos avatar Jan 27 '25 21:01 ahrakos

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar May 02 '25 02:05 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen if the issue persists.

github-actions[bot] avatar May 16 '25 02:05 github-actions[bot]

@ahrakos Can you share exactly how you set up this sidecar? I'm having the same issue. This seems like a very convoluted way of getting around a simple HTTP response code check.

streamnsight avatar Sep 20 '25 14:09 streamnsight

@ahrakos Can you share exactly how you set up this sidecar? I'm having the same issue. This seems like a very convoluted way of getting around a simple HTTP response code check.

@streamnsight I agree, I am not in favor of doing that. What I ended up doing is using my PR (which overrides the problematic dependency which checks the status code) to build my own langfuse images, and store them in my private ecr registry. Then used helm values in order to pick my own image.

https://github.com/langfuse/langfuse/pull/5251 - this is the PR.

ahrakos avatar Sep 20 '25 14:09 ahrakos