cdk-nextjs icon indicating copy to clipboard operation
cdk-nextjs copied to clipboard

403 error on static files during deploy

Open anthonyroach opened this issue 11 months ago • 6 comments

version: 4.0.0-beta.25

If I try to access my nextjs site while a deploy is happening I'll often get 403 errors on all the static files.

There was a previous issue similar to this, but I'm using a version with the fix for this old issue and am still seeing the issue.

Here’s what I think might be going on…

The altered .js, .css, and .woff2 files are given unique file names for each nextjs build (when they change). So when you deploy a new nextjs build it uploads all the new files, and then deletes all the old ones that don't exist anymore.

But the s3 files are deployed separately from the server lambda, and the server lambda responds with HTML that references the .js, .css and .woff2 files. So the server lambda may return HTML containing the new file names before the S3 files are deployed, or the old file names after the S3 files are deployed. Both cases will result in 403 errors from S3 because the files don’t exist.

The only way I can see to fix this is to do two things:

  • Make sure the server lambda deploys after the static files are deployed using an explicit dependency on the bucket deployment.
  • Don't prune the static files from the S3 bucket during a deploy. This means the static files won't ever get deleted, but that's a small price to pay for not having your site broken on every deploy.

I'm not seeing any way to do the above without forking. The bucket prune setting isn't exposed and is hard coded as true.

anthonyroach avatar Mar 11 '24 15:03 anthonyroach

Actually looks like prune can be set without a fork:

        nextjs: {
          nextjsStaticAssetsProps: {
            prune: false,
          },
        },

And the dependency can also be added like so:

    nextjs.serverFunction.lambdaFunction.node.addDependency(
      nextjs.staticAssets,
    );

I haven't tested the above, yet.

anthonyroach avatar Mar 11 '24 15:03 anthonyroach

I tested the above configuration and it eliminated the 403 errors on deploy.

IMHO, the above settings should be the default because having your site break on deploy is not good.

anthonyroach avatar Mar 11 '24 18:03 anthonyroach

@anthonyroach, look at the below code from here:

// must find old object keys before uploading new objects so we know which objects to prune
      const oldObjectKeys = await listOldObjectKeys({
        bucketName: props.destinationBucketName,
        keyPrefix: props.destinationKeyPrefix,
      });
      if (!props.zip) {
        debug('Uploading objects to: ' + props.destinationBucketName);
        await uploadObjects({
          bucket: props.destinationBucketName,
          keyPrefix: props.destinationKeyPrefix,
          filePaths,
          baseLocalDir: sourceDirPath,
          putConfig: props.putConfig,
        });
        if (props.prune) {
          debug('Emptying/pruning bucket: ' + props.destinationBucketName);
          await pruneBucket({
            bucketName: props.destinationBucketName,
            filePaths,
            baseLocalDir: sourceDirPath,
            keyPrefix: props.destinationKeyPrefix,
            oldObjectKeys,
          });
        }
      }

see how we're being careful to get old objects keys, uploading new objects, then pruning old objects? This order shouldn't result in any 404s. What am I missing?

Pruning is desired because you don't want a bunch of stale static files sitting in your S3 bucket, right?

bestickley avatar Mar 11 '24 18:03 bestickley

It's a race condition. If the static files get deployed before the server lambda, then the old files will have been deleted while the lambda is still serving HTML referencing the old static files that have been deleted. If the lambda deploys first then it will start serving HTML that references new static files that don't exist yet. Either way you get 404/403.

If the static files need to get pruned, then they need to be pruned after there are no more old server lambda instances running, but the new static files need to get deployed before any new lambda instance start running the new code. The only way I can see to satisfy both these conditions is to either not prune or prune much later on in some kind janitor or lifecycle outside of the deploy. And in either case the new lambda can't be deployed until the static files have been deployed.

anthonyroach avatar Mar 11 '24 22:03 anthonyroach

I didn't know the server lambda holds static references to files in s3. I thought the server lambda read from s3 for static files. Could there be any other reason for this behavior? Otherwise it sounds like prune: false is best for default.

bestickley avatar Mar 11 '24 23:03 bestickley

Yeah, the lambda just renders the HTML from the server code inside the lambda itself, and the HTML contains URLs pointing at the static files that are then served out of S3. The server code inside the lambda is self contained and doesn't reach out to the bucket directly as far as I know.

anthonyroach avatar Mar 12 '24 02:03 anthonyroach

We're also facing this issue quite often... even after days of the last deployment.

Is it maybe possible that the index.html is cached for too long on the client? I guess that's something one could configure in the server code / distribution behavior?

ansgarS avatar Apr 29 '24 10:04 ansgarS

@anthonyroach and @ansgarS, I understand the issue now. If user fetches website before deployment, and doesn't refresh after deployment, the HTML file in browser tab will reference old JS assets and result in 404s. I agree default behavior should be prune: false.

Ideally, the NextjsBucketDeployment lambda function would prune files that are X days old b/c I think it's safe to assume a user won't have browser tab open for more than X days. Maybe X = 30? Don't want static assets piling up, although cost would likely be negligible - more for cleanup purposes.

bestickley avatar Apr 29 '24 12:04 bestickley

Interestingly, ...

  • we have set prune to false "false" on April 22nd
  • our last deployment was on April 26th
  • the error just happened an hour ago

Something else, I noticed. These chunk load errors are always connected with some SVG (it is our icon lib)

Screenshot 2024-04-29 at 15 12 15

ansgarS avatar Apr 29 '24 13:04 ansgarS

@ansgarS, when was your last non-prune deployment? Could someone have opened a browser tab before then and resulted in this error?

bestickley avatar Apr 29 '24 13:04 bestickley

The last deployment with prune:true was on April 19th and yes, our application is not used very frequently. Could be possible. We will wait for another week and see if it still happens :)

ansgarS avatar Apr 29 '24 13:04 ansgarS

Browser caching certainly makes this error more likely to happen, but it can even happen with caching disabled if you hit the page right in the middle of the deploy. That happened to us quite a few times when our app was under heavy development and was being deployed very frequently throughout the day. Turning off pruning and adding the dependency between the server lambda and the bucket deployment has totally resolved the problem for us. Haven't had a single error since we rolled that out back in March.

anthonyroach avatar May 10 '24 01:05 anthonyroach

Okay, I think we could solve it by another fix: we were loading our icon lazily like this:

import { lazy as _lazy } from 'react';

function lazy(importFn: { (): Promise<typeof import('./assets/*/*.svg')> }) {
    return _lazy(async () => {
        const m = await importFn();
        return { default: m.ReactComponent };
    });
}

export const getIcon = (
    iconName: IconNameType
): React.LazyExoticComponent<React.FunctionComponent<React.SVGProps<SVGSVGElement>>> => {
    const [iconDir, iconFileName] = ICON_PATHS[iconName];

    return lazy(async () => import(`./assets/${iconDir}/${iconFileName}.svg`));
};

Removing the lazy-loading also squashed the ChunkLoadError for us

ansgarS avatar May 14 '24 09:05 ansgarS

Which means, we probably had another issue than you had 🥶

ansgarS avatar May 14 '24 09:05 ansgarS

While a better fix is described in my previous comment, for now I'll resolve this by making prune default to false.

bestickley avatar Jul 13 '24 10:07 bestickley