workos-node icon indicating copy to clipboard operation
workos-node copied to clipboard

Token refresh in middleware causing issues

Open kevinmitch14 opened this issue 1 year ago • 18 comments

Hi,

We are using the Node SDK in our nextjs application. We are running into issues in relation to refreshing the tokens in middleware.

Basically, when using something like tanstack query, and refocusing on the window, it will refetch the queries. We have some queries pointing to our API route handlers. This works okay, until the token needs to be refreshed, it then seems that my custom assert session helper doesn't pick up the session. Any ideas as to why?

This is working perfectly, until the refetch on window focus occurs and refresh token needs to be refreshed

export const assertSession = async () => {
  if (typeof window !== "undefined") {
    throw new Error("getSession can not be used on the client");
  }

  const cookieStore = await cookies();
  const workosCookie = cookieStore.get(WORK_OS_COOKIE_NAME)?.value;

  if (!workosCookie) {
    redirect("/sign-in");
  }

  const session = loadSealedSession(workosCookie);

  const authSession = await session.authenticate();

  if (!authSession.authenticated) {
    redirect("/sign-in");
  }

  return authSession;
};

This is what I'm observing in the logs

Screenshot 2024-12-09 at 12 51 34

kevinmitch14 avatar Dec 09 '24 15:12 kevinmitch14

And in the case of using the nextjs library version, with a custom UI, is it possible to use to use authMiddleware? Because we do not want to use hosted authkit, we want a simple magic auth sign in at the /sign-in route. But it seems the middleware wants redirect to a callback URL?

kevinmitch14 avatar Dec 10 '24 01:12 kevinmitch14

For your first question, there's not a lot to go on with what you've provided. I suggest adding more logging to see what the problem might be. authSession in your code will have a reason parameter if authenticated === false. I suggest starting there.

As for the Next.js library, it will currently only work if you use AuthKit rather than your own UI. When using your own UI we recommend using the Node SDK instead.

PaulAsjes avatar Dec 10 '24 09:12 PaulAsjes

Hey Paul, the reason is invalid_jwt. I think it is because, the updated cookie in middleware is not directly available in the route handler. I believe this is why there is getSession() (which only works in nextjs middleware) and withAuth() which works outside of nextjs middleware, in the nextjs library.

withAuth() reads the session from the headers it seems, while the getSession() reads from cookies. And the Authkit middleware, does some header manipulation to help make all this work. Because setting a cookie and directly hitting an API route handler would not work.

To achieve a custom UI + nextjs middleware, with the node sdk, I would need to do all this header manipulation too right?

Would an option be to use the authkit middleware without middlewareAuth enabled, and manage this myself. Maybe a best of both worlds? Something like this - https://github.com/workos/authkit-nextjs?tab=readme-ov-file#retrieve-session-in-middleware. Because with middlewareAuth enabled, middleware just wants to redirect to the hosted authkit domain.

Using the authkit middleware eliminates issues with manual refreshing/header manipulation etc. (Session is available in route handlers etc)

kevinmitch14 avatar Dec 10 '24 12:12 kevinmitch14

Could you confirm if this example actually works? I am getting really strange results here. Debug logs say session is valid, but when I check getSession() like in the docs, there is no session. I am using the exact same code as shown in the readme.

The debug logs from the middleware say there is a session. But logging the session like in the example, states otherwise.

https://github.com/workos/authkit-nextjs?tab=readme-ov-file#retrieve-session-in-middleware

kevinmitch14 avatar Dec 10 '24 18:12 kevinmitch14

I'm running into a similar issue with an invalid_grant when refreshing a session token within middleware 🤔

DarrylBrooks97 avatar Dec 12 '24 00:12 DarrylBrooks97

@kevinmitch14 Are you using the provided middleware or are you using your own custom middleware with custom UI? invalid_jwt indicates that the access token is invalid, likely due to it being expired.

@DarrylBrooks97 You should receive a description alongside that error that will give more clues as to what's going on. i.e. "Refresh token already used". Can you share the description?

PaulAsjes avatar Dec 16 '24 09:12 PaulAsjes

My issue turned out to be that setting a cookie using the cookies() function within middleware did not propagate to my server components. Therefore when trying to authenticate the session within the component it used an expired accessToken. My solution was to just set the cookie on the returned middleware response

DarrylBrooks97 avatar Dec 27 '24 22:12 DarrylBrooks97

@PaulAsjes, I also occasionally encounter the following error when updating the token:

Screenshot 2025-01-11 at 17 37 44

I found the only mention of the invalid_grant error in this PR: https://github.com/workos/workos-node/pull/1078. Why is it being returned and how should it be handled?

monolithed avatar Jan 11 '25 10:01 monolithed

@PaulAsjes, I also occasionally encounter the following error when updating the token:

Screenshot 2025-01-11 at 17 37 44 I found the only mention of the `invalid_grant` error in this PR: [#1078](https://github.com/workos/workos-node/pull/1078). Why is it being returned and how should it be handled?

This happens when you try to refresh the session again with the same previous cookie. The expired session can be refreshed only once. Check if you are refreshing session and not saving the cookie, which leads to refresh session call again with the same expired cookie. See if this helps

vikram-wald avatar Feb 20 '25 17:02 vikram-wald

Hello we are facing similar issues at Aline. For the invalid_grant error, it seems impractical to ensure that the refresh token is used only once. What is your client application sends multiple requests using the same cookie almost simultaneously? One of those requests will consume the refresh token, and the others will fail with invalid_grant. That is what we are seeing on our end.

Shouldn't WorkOS provide some grace period for using the same refresh token in a short duration of time to allow updating the cookie on the client?

BrentFarese avatar Apr 03 '25 01:04 BrentFarese

Going back to the OP's post. What I think is happening is that when using something like Tanstack Query (which we're using in our system) and you return to a tab, Tanstack Query likes to refetch data from the server it considers to be stale. It does this automatically (but it can be turned off).

When user's access token expires, the refresh token needs to be used to get a new one. And once consumed, it can't be used again. So when Tanstack ends up trying to refetch information it thinks is stale, the operation could result in more than 6 concurrent API calls because that's what it would take to reload everything that is on screen. We're going to have a problem when multiple requests come in with the same expired credentials because the middleware is going to try to get a new access token for each request, all using the same refresh token! This makes a race condition. The first request to complete with WorkOS API will pass, and the rest are likely to fail. When it fails the response looks like the user's credentials are invalid and needs to get logged out.

So since we try to apply best practices in our project, when we see a de-authentication type event, we kick the user out and send them to managed AuthKit to get logged back in. AuthKit has it's own cookie and sees the user is logged in so it sends them back with a auth code to redeem to get a new access and refresh token pair. THIS SHOULDN'T HAPPEN. It's bad for our users and bad for our application. We should only kick the user back to AuthKit if they needed to be sent there. Obviously they didn't need it.

Ultimately the approach of using middleware seems to be an anti-pattern when working with WorkOS and we've concluded that we need to use AuthKit on the client to manage our session and only have the middleware verify the authenticity of the session. Nothing more. Otherwise we're going to have this problem which, if you were absolutely committed to sticking with using middlewares to manage your session and the refresh process, will require some kind of persistence, locking and request blocking mechanism, especially when you want to horizontally scale, and that only leave you open to having even more bugs. JWTs by design are supposed to make it so shouldn't have a stateful component on your server for managing your sessions. So... We're going to use AuthKit-JS to make it so our SPA will be able to better manage session token refreshing and even org switching.

Yet, I feel that WorkOS should either fix this problem, respond here, write a blog post, or update their documentation in an official capacity because once any of us poor saps end up reaching this problem, it likely means that our integration with WorkOS has reached a certain level of maturity and implementing a fix would be burdensome.

MatthewAry avatar Apr 10 '25 21:04 MatthewAry

@MatthewAry what you wrote is entirely correct. But the official WorkOS docs say to use middleware in the way we're using it. It seems to me that the refresh token should be able to be consumed more than once, in a specified short period. Then, if there are concurrent requests that use an old access token/refresh token pairing, they will all work for like a 1 minute duration.

I wrote to WorkOS and requested that they make refresh token not single use.

BrentFarese avatar Apr 10 '25 23:04 BrentFarese

Oh. Look. Seems like they ran into this issue with AuthKit already. https://github.com/workos/authkit-js/releases/tag/v0.7.1 And they added a lock system to fix it.

MatthewAry avatar Apr 14 '25 22:04 MatthewAry

WorkOS's API docs say they support idempotency so, couldn't they modify their SDK so it's easier to make idempotent requests regarding refresh tokens? If we did that then I think we could eliminate potential race condition issues. We'd still have to have some mechanism resolve an Idempotency-Key for multiple/same refresh requests, but we wouldn't need to have a locking strategy.

MatthewAry avatar Apr 15 '25 13:04 MatthewAry

Hi all,

Thanks for the detailed discussion on this issue. After investigating this further, we've identified that this is a timing issue that occurs when a lot of requests come in at once with the same refresh token.

What happens is that when multiple concurrent requests attempt to refresh using the same token (like when TanStack Query refetches data after window focus), only the first request succeeds, along with any subsequent requests with the same refresh token within the grace period. After that, refresh requests with the same token fail and that causes the SDK to delete the cookie, logging the user out.

Previously, we had a 2s grace period to allow for the same refresh token to be used multiple times within that window. To address the issues you're experiencing, we've increased that grace period to 30s, which should help alleviate these problems in most scenarios.

This change should reduce the frequency of users being unexpectedly logged out when multiple API calls are made in quick succession.

If you're still experiencing these issues after this update, please reach out to us directly so we can investigate your specific use case more thoroughly.

nicknisi avatar Apr 16 '25 20:04 nicknisi

@nicknisi does this require upgrading workos-node or it's something your team enabled in the back end and we need to do nothing? Thanks!

BrentFarese avatar Apr 17 '25 02:04 BrentFarese

@BrentFarese This was a back end change so no additional changes needed on your part. Thanks!

nicknisi avatar Apr 17 '25 13:04 nicknisi

Refresh token calls should still support idempotency key in the Node sdk imo

jczstudios avatar Apr 24 '25 22:04 jczstudios