Client cannot recover from version skew
Link to the code that reproduces this issue
https://github.com/knpwrs/nextjs-skew-recovery-bug
To Reproduce
- Run
npm run build - Run
npm start - Open
http://localhost:3000and make sure the browser development tools are open. - Press the
Server Actionbutton. - Observe logs on the server indicating the function was called.
- The network response has a
200response code indicating no errors and atext/x-componentmime type. - Shutdown the server, leave the app running in the web browser.
- Rename the
logServerfunction inactions.tsand update the import and usage incomponents.tsxto match (for instance,logServercan be renamed tologServer2). - Run
npm run build - Run
npm start - Go to the already running app
- Press the
Server Actionbutton. - Observe an error on the server:
[Error: Failed to find Server Action "006c3c7b08402d18959b82a9692db1011f32bcc8fd". This request might be from an older or newer deployment. Original error: Cannot read properties of undefined (reading 'workers')] - There are no errors on the client. Error boundaries do not trigger. There are no uncaught errors in the console. There is no way for the client to know that the function call failed and no way for the client to recover.
- The network response has a
200response code indicating no errors and atext/htmlmime type. - Press the
Throw Errorbutton. Observe an uncaught error in the console.
Note that I couldn't get error.tsx or global-error.tsx to work for either the failed function call or the thrown client-side error.
Current vs. Expected behavior
Currently the client is not able to recover from version skew when a server action cannot be called. Everything appears normal to the client.
I would expect the error boundary to catch an error so the client can refresh and recover.
Provide environment information
Operating System:
Platform: darwin
Arch: arm64
Version: Darwin Kernel Version 24.2.0: Fri Dec 6 19:01:59 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6000
Available memory (MB): 32768
Available CPU cores: 10
Binaries:
Node: 23.6.0
npm: 10.9.2
Yarn: 1.22.19
pnpm: 9.12.2
Relevant Packages:
next: 15.2.0-canary.33 // Latest available version is detected (15.2.0-canary.33).
eslint-config-next: N/A
react: 19.0.0
react-dom: 19.0.0
typescript: 5.7.3
Next.js Config:
output: N/A
Which area(s) are affected? (Select all that apply)
Server Actions, Error Handling
Which stage(s) are affected? (Select all that apply)
next start (local), Other (Deployed), Vercel (Deployed)
Additional context
This is particularly problematic given the following quote from this blog post:
Secure action IDs: Next.js now creates unguessable, non-deterministic IDs to allow the client to reference and call the Server Action. These IDs are periodically recalculated between builds for enhanced security.
I couldn't find any documentation about this. It appears that action IDs can change at any time and clients which haven't refreshed yet won't have any way to deal with this.
Have you tried this? https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced
Thank you for the link, @leerob. It's not entirely clear from the documentation that the encryption key affects the non-deterministic action IDs. If it does, that doesn't fully address this issue.
Even if the encryption key is kept the same across builds it is still possible to get a different action ID and the client still has no way to recover when that happens.
I couldn't find any documentation anywhere on how action IDs are generated. Is it some sort of hash involving the file name, function name, and the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY? In this case renaming the file, renaming the function, moving anything, or even building on a new machine can change the action ID (say, if the action ID is generated with a full absolute path to the file).
After some more experimentation the following code also does not catch any client-side errors, even though the server output indicates that the action cannot be found:
"use client";
import { logServer2 } from "./actions";
export function CallServerActionButton() {
return (
<button
onClick={async () => {
try {
await logServer2();
} catch (e) {
// This never catches anything, even if the action is not found server-side
console.error("Error in CallServerActionButton:", e);
}
}}
>
Server Action
</button>
);
}
Having the same issue on a Next.js ^15.0.2 app deployed on AWS with SST v2. I agree with @knpwrs. The documentation isn’t clear enough on:
- How this helps resolve different action IDs across builds
- How to generate the custom encryption key
This has been really tough to troubleshoot since there are no client-side errors, no way to identify which action is causing it, and no way to track it down. We also can’t determine the user’s experience. We assume the action isn’t executed, leaving the app broken without any way to handle or provide feedback to the user.
I tried to open an issue to get clarity around this in docs and it was immediately closed by a bot. https://github.com/vercel/next.js/issues/75448
@mbranch I tried the same thing before opening this issue and it was also closed by the bot: #75492
It seems like the bot just closes all documentation issues because there isn't a field in the template for a reproduction.
@mbranch @knpwrs Looks like there's an issue with the GitHub actions closing these Documentation template issues, taking a look 👁
Thanks for looking into this @samcx. Not directly related to this issue, but I similarly tried to open an issue about issues getting closed too quickly (it also got closed :joy:): https://github.com/vercel/next.js/issues/75449
@mbranch For that it's working as expected because we need a GitHub repo link (you provided a link to an issue instead)—the bot should not run when you run the Documentation issue template.
Even if the encryption key is kept the same across builds it is still possible to get a different action ID and the client still has no way to recover when that happens.
@knpwrs Did you confirm this with your reproduction? I am not seeing the Environment Variable in in your reproduction.
I do agree we could improve our Documentation here, so taking a look at that as well—
@samcx the reproduction is if you rename a function or do a similar refactoring such as moving a function. Clients which have not refreshed between deployments will attempt to call non-existing actions and the client has no way to recover —- no errors are thrown, and even if one were to install a service worker to intercept fetch calls the response code is 200, even though something like 404 would probably be more appropriate (though given that the network call is abstracted away this doesn’t matter as much as just making some sort of error the client can recover from).
Exactly the same issue here and this is just the most often reason when an issue like https://github.com/vercel/next.js/discussions/76149 happened. For my scenario, this is just a normal build up, but in my app actually many users will stay longer than usual. You can assume a scenario like this:
- User visit your app.
- For some reasons, they stay and do nothing. This can be normal if you open up Github and do nothing on it, then maybe hours later you come back to visit some repos, that where the issue affects.
- You build up your app, a server action id is changed from 123 to 456.
- Maybe hours later, user is back, and try to call the server action 123.
- But in your current service, there is actually no server action 123 anymore, only 456, and that cause the issue.
I tried something but seems like it will be hard to figure out what happend on service from client side, although nextjs will try to return the whole page if server action is not able to use (not sure with that but seems like it does).
And I have an idea, about reading the response on client side, if the server action is done, I can't get my own normal response wrapper back (like code and data), so maybe I could check if the response actually has the related props, to know if the current server action is down. But one more considering is I can't actually get anything back because the whole server action is down, rather than part of it, and even for try catch on server side will also not help. So I can't tell if my server is crashed, or service is updating, or something else. I can't guide user to reload my app again if my server is really crashed.
So any better idea? I think it will be more important to know the version or service has changed on the client side, rather than the version control on server side. Of course it would be easier if we have an exposed built-in support for version control on server side.
And by reading this https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced, I am also wondering if and why keep the encryption key in sync can solve this issue, for my opinion I think it can't. And I actually only have one pod on one machine, and it should be not the reason about the machines are not in sync, but the client side and server side are not in sync.
Also this issue normally won't happen on route handlers, which is normal API. Because if you codes well, you won't always change the API like rename the /passport/is-login to /passport/is-login-v2 directly, this will actually cause the issue we are having now. Normally add an extra props can solve this issue, like uid: 123 to uid: 123, uid_v2: 12345, or just add the /is-login-v2 and also keep the /passport/is-login.
Anyway, hope we can add more info in docs about this, and much better if we could logs more when this issue happend. It was so hard to debug on a online nextjs service.
After some more experimentation the following code also does not catch any client-side errors, even though the server output indicates that the action cannot be found:
"use client";
import { logServer2 } from "./actions";
export function CallServerActionButton() { return ( <button onClick={async () => { try { await logServer2(); } catch (e) { // This never catches anything, even if the action is not found server-side console.error("Error in CallServerActionButton:", e); } }} > Server Action ); }
By my tests and some reads from stackoverflow, I think at this time logServer2 is not returning nothing or else, it will return the whole page back, which completes the request. You can test it by using a fetch, or just curl or fetch the server action request directly, if you change the next action header to any other things, it will return the page DOM, rather than 404 or other errors.
@LikeDreamwalker I am also wondering if and why keep the encryption key in sync can solve this issue.
We've added it to the build process and the live servers and don't see any difference (still loads of untraceable errors). I'm still not clear if this is a runtime or build-time encryption key.
@LikeDreamwalker I am also wondering if and why keep the encryption key in sync can solve this issue.
We've added it to the build process and the live servers and don't see any difference (still loads of untraceable errors). I'm still not clear if this is a runtime or build-time encryption key.
I think this strategy is mainly focused on if we build the same app on different machines, but use them together. Giving a static and synchronized key can indeed make everything in sync at the build time, but not when lots of user are using.
So I think it may solve the issue if we really have conflicts in multiple machines or pods (although I don't undertstand in this scenario why don't we use the same image to deploy), but not if we have conflicts between an old client and a new service. And seems like this issue is pointing to this, not the multiple machines conflicts. Sadly.
I think I have a rough solution based on @knpwrs 's repo: LikeDreamwalker/nextjs-skew-recovery-bug
To get started, we need to be clear with some concepts:
- This issue is actually pointing to the scenario that Client and Server are not in sync, not if the multiple pods or machines on Server are not in sync, so if you are in the second scenario this solution can't help.
- This solution is not well-structured and not well-tested. Especially I use some hacky way from community.
So based on this issue's original reproduce, we can know that this issue happens because client side is trying to request to server with an out of date action id, which causes the version skew issue. My idea is we try to do something in middleware, if we can know that the incoming server action (or any other) requests is out of date, we can use some way to notice this client to ask it to reload the page.
For the deployment part, I choose deploymentId to keep the client and server in sync. This feature seems very much like the normal package verison, but it will be exposed to every request's header if we set it manually.
For the checking part, I use middleware to achieve this:
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import deploymentId from "./staticId";
export function middleware(request: NextRequest) {
if (request.nextUrl.pathname === "/refresh") {
return NextResponse.next();
}
const clientDeploymentId =
request.nextUrl.searchParams.get("dpl") ||
request.headers.get("x-deployment-id");
if (clientDeploymentId && clientDeploymentId !== deploymentId) {
console.log("Client deployment id:", clientDeploymentId);
console.log("Server deployment id:", deploymentId);
const isAction =
request.method === "POST" && request.headers.has("Next-Action");
const refreshUrl = new URL("/refresh", request.url);
if (isAction) {
// For server actions, set a header to be handled by the action
const response = NextResponse.next();
response.headers.set("x-action-redirect", refreshUrl.toString());
return response;
} else {
// For regular requests, redirect directly
return NextResponse.redirect(refreshUrl);
}
}
}
export const config = {
matcher: "/((?!_next/static|_next/image|favicon.ico).*)",
};
This is a version for test, and if you want to use as a workaround solution, please update the specific parts. There are many ways to notice the client with something, but if you want client to be noticed ASAP, by control both the server action and normal request would be better and redirect to a special route is also acceptable.
And here is how you could test with my version:
- Run npm run build
- Run npm start
- Open http://localhost:3000 and make sure the browser development tools are open.
- Press the Server Action button.
- Observe logs on the server indicating the function was called.
- The network response has a 200 response code indicating no errors and a text/x-component mime type.
- Shutdown the server, leave the app running in the web browser.
- Rename the action and also the deploymentId in the
staticId.tsfile, like "version-skew-v20" - Run npm run build
- Run npm start
- Go to the already running app
- Press the Server Action button.
- Now middleware found out this request is from the old client, and it will try to redirect to the client to the refresh route, which will call
window.location.hrefafter 3 seconds - Since the page has been reloaded, user will be free from the old client and verison skew.
And this solution is a kind of rough because:
- No idea why, but seems like for the first request it won't have the deploymentId, so we should check if the deploymentId is valued then check if they are matched. Don't know if this will have side effects, but since in this scenario the old client should already have sent requests, so it should be fine.
- You can't handle server action directly in middleware https://github.com/vercel/next.js/discussions/64993, and my solution is from there. I have no idea if this have side effects or will be blocked in the future.
- Because 2 seems like we will still receive one log about Error: Failed to find Server Action, but after this user should be at the new client, so I can't say it will solve the error log completely but can make it better than before.
- You must set a deploymentId in every build, and to keep the deploymentId in sync from server to client, you should generate it in static in build time, not like a random function to generate one by call: Because the nextjs server will call it again, so they will never be the same if you generate it in runtime.
- For the specific redirect way, I haven't checked if the
router.refresh()can work better thanlocation.href(). But for version skew I think it can be understandable to force reload rather than refresh. You can watch this video to understand:
https://github.com/user-attachments/assets/e014af89-8216-48b4-aead-1c80ccecb851
Solve #76149
Does the action-id only change if you change the server-action code somehow? Or does it get a new Id on every build?
Edit: Got the the answer
I guess a workaround would be to just not use serveractions when self-hosting and just use route handlers.. :/
What a great work you have done investigating @LikeDreamwalker 🙌 You workaround seems solid, but should not be needed imo.
This issue should be quite broad for people using Next 15 and standalone right?
Good to address security in v.15 but this is now a serious issue for us in production affecting several customers and very hard to track, understand and fix. Is there any way to turn it off?
We are considering doing a re-work and remove all server-actions in favor of route-handlers instead. But as you can imagen this would be a to cumbersome and a fix would be much more preferred.
@leerob I think this issue should not be overlooked because this is something that could potentially cause users a lot of issues. You(or someone else) help to find a solution for this would be very appreciated by many I think 🙏.
@J4v4Scr1pt To reiterate what was mentioned above, you can essentially opt-out of this behavior when self-hosting → https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced
@J4v4Scr1pt To reiterate what was mentioned above, you can essentially opt-out of this behavior when self-hosting → https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced
Thank you so much for your response! Sry that I missed this information. After reading it, just to make sure I understand correctly. Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?
I bet you have 100 other things to do so thx again for your help 🙏.
Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?
Yes! So it'll be up to your digression on how you want to rotate the key.
Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?
Yes! So it'll be up to your digression on how you want to rotate the key.
Thank you so so much for your reply and I understand it now. But I am thinking if this is a little bit of paradox? For the safety considering, the server actions will be "changed" after every build by default, and this can cause version skew after every build; To avoid the version skew, we can keep the encryption key be the same for every build, which actually make every server action, or we can say the endpoint of our server action API routes, always be the same for every build.
This is a little bit of complex but seems like the current conclusion is we accept the version skew for better security, or we solve the version skew and give up for better security.
I don't know if I also missed some info (and don't know why I just read the encryption part again to understand this rather than the first time), does next have a way to detect the version skew on the server side, or will we have a plan for the future? I think it would be a better solution if we can detect version skew, set up a business-based solution for version skew (like refresh or something), then we could solve the two issues I have mentioned above.
Thanks again! And just for a reminder for others:
https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#closures-and-encryption
In my personal (and I suppose professional) opinion, the encryption key for action ids does not provide increased security whatsoever. Especially since there is a mapping of action ids sent down to the client in the server-rendered HTML. I’ve never had a need to obscure endpoints in my last 13 years as a professional web developer. It would be far better to encourage good security practices for developers and avoid all this encryption rigmarole altogether.
I can certainly see the argument for using an encryption key to send sensitive session state down to the client, but not for obscuring the action ids. It certainly seems to create more problems than it solves.
In any case, this issue is about the client not being able to recover from version skew. The client has no way of knowing that an action call has failed.
I can certainly see the argument for using an encryption key to send sensitive session state down to the client, but not for obscuring the action ids.
I really agree here. Actions should be treated like any API endpoint with respect to backwards compatibility for clients in the wild. Obfuscating them randomly isn't really security and only hurts logging and observability.
I think very often features that might make sense for deploying in Vercel are leaking into Next.js as a framework without as much consideration for those who are deploying and supporting Next.js on their own.
I still think there's a lot of missing information about NEXT_SERVER_ACTIONS_ENCRYPTION_KEY:
- Is this a build-time or runtime env var? Or maybe both?
- What is the exact format of the key? https://github.com/vercel/next.js/issues/61020#issuecomment-2095906151
- The statement "This variable must be AES-GCM encrypted." doesn't clarify. I think what's meant here is something along the lines of "This key is a server-side secret and should be treated accordingly. If it is compromised, an adversary could possibly use it to decrypt server-side secrets in payloads to the client."
- More examples of exactly how this symmetric key is used to encrypt/decrypt payloads would be helpful. What exactly are the payloads? This can help Next.js users understand the consequences of a key leaking, etc.
In my personal (and I suppose professional) opinion, the encryption key for action ids does not provide increased security whatsoever.
For me, after I realized even for server actions are still the APIs and can be directly called under some conditions, they means not secure, or secure as the normal API to me. I still think the version skew caused by the encryption key and clients recovery from that are more important. After all there are too many other ways to secure a request, but seems like not too many for clients to recovery.
However if the encrypt key is good for security, it can be preserved, but it should not have conflicts with the recocery and response. I think recovery and security are two parallel things, no need to bind them together, which will be more complex.
Also I wonder if the encrypt key is really designed for the security. In my opinion I think it is more like a tool to make the server actions anonymous. Think about this, if we make a server action named like getUserInfo(), next actually can't make sure there will only be one of them. So the easiest way is to abstract the server actions into a random and only id and register it, use it to call. As for the changing after every build, I think this is the way it does. I would worry more if the id is not changing and causing the cross between server actions if I am designing this. Just a thought, no evidence and I think there must be more consideration which I don't know, and with respect.
Have you all tried the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY? I still see this error from time to time, not as much as before but still. 🤔
Edit: Maybe I'm to eager, gonna let this be in production for a while. It could be due to customers having tabs open from before fix.
But a question, if I change the serverAction code it will also result in a new Id correct?
I still think there's a lot of missing information about
NEXT_SERVER_ACTIONS_ENCRYPTION_KEY:
- Is this a build-time or runtime env var? Or maybe both?
- What is the exact format of the key? Docs: Document expected NEXT_SERVER_ACTIONS_ENCRYPTION_KEY format #61020 (comment)
- The statement "This variable must be AES-GCM encrypted." doesn't clarify. I think what's meant here is something along the lines of "This key is a server-side secret and should be treated accordingly. If it is compromised, an adversary could possibly use it to decrypt server-side secrets in payloads to the client."
- More examples of exactly how this symmetric key is used to encrypt/decrypt payloads would be helpful. What exactly are the payloads? This can help Next.js users understand the consequences of a key leaking, etc.
I agree we need more details on how this variable is used. I've tested locally and it seems we only need this during build time to retain consistent action ids. Can someone confirm this?
I extracted this from the source code, this is how keys are currently generated
function arrayBufferToString(
buffer: ArrayBuffer | Uint8Array<ArrayBufferLike>
) {
const bytes = new Uint8Array(buffer)
const len = bytes.byteLength
// @anonrig: V8 has a limit of 65535 arguments in a function.
// For len < 65535, this is faster.
// https://github.com/vercel/next.js/pull/56377#pullrequestreview-1656181623
if (len < 65535) {
return String.fromCharCode.apply(null, bytes as unknown as number[])
}
let binary = ''
for (let i = 0; i < len; i++) {
binary += String.fromCharCode(bytes[i])
}
return binary
}
async function generateKey() {
const key = await crypto.subtle.generateKey(
{
name: 'AES-GCM',
length: 256,
},
true,
['encrypt', 'decrypt']
)
const exported = await crypto.subtle.exportKey('raw', key)
const result = btoa(arrayBufferToString(exported))
return result
}
Thank you so so much for your reply and I understand it now. But I am thinking if this is a little bit of paradox? For the safety considering, the server actions will be "changed" after every build by default, and this can cause version skew after every build; To avoid the version skew, we can keep the encryption key be the same for every build, which actually make every server action, or we can say the endpoint of our server action API routes, always be the same for every build.
This is a little bit of complex but seems like the current conclusion is we accept the version skew for better security, or we solve the version skew and give up for better security.
I agree with this sentiment unless we implement our own skew protection, rotating this key would cause the same issue. It seems like we have to make a tough trade-off between user experience vs security.
I think they can remove this completely instead and have it as it was in v14. The security should be up to us developers, you will not create an api-end-point without a proper protection layer. And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.
And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...
I think they can remove this completely instead and have it as it was in v14. The security should be up to us developers, you will not create an api-end-point without a proper protection layer. And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.
And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...
IDK, do we have the "encryption" part in v14? I thought this is always existed since server action released...