redwood icon indicating copy to clipboard operation
redwood copied to clipboard

NotFoundPage incorrectly gives 200 status error instead of 404

Open thedavidprice opened this issue 4 years ago • 22 comments

Because Redwood is a single-page app, the built-in 404-page web/src/pages/NotFoundPage/NotFoundPage.js is handled by WebPack as a "normal" page, i.e. it's optimized and becomes a part of the dist/static/js/…chunk.js files during build. This can/could be problematic because it results in a 200 http status code for the 404 page and it also requires JS.

This seems like a good opportunity to take a step forward with Redwood pre-rendering! The goal would be to build NotFoundPage as a separate, static asset. Likely there are other solutions that would work (possible that WebPack already has a config option). But if there's a way to learn and step toward Redwood having pre-render capability, I suggest that be the preferred option.

thedavidprice avatar Aug 12 '20 18:08 thedavidprice

@thedavidprice Would we redirect the user to the notfound page when we can't find it in the router?

peterp avatar Sep 30 '20 21:09 peterp

The intention is to have the same behavior; e.g. if page isn't found in Router then redirect to notFoundPage.

I wonder if the simplest way to handle this is just to have a template static file for 404 and have the router either redirect to the static file in Routes.js or handle it automatically.

thedavidprice avatar Oct 05 '20 05:10 thedavidprice

But if it redirects to a page that exists (the NotFoundPage) then by definition that's a 200! The page has to either really not exist (and the web server returns a 404), of if you have access to the config of the web server itself you can tell it to return a 404 even though that page exists on disk and can be served.

I don't think we can get around this in Jamstack land because everything is served from a CDN and the provider is probably not going to let you override response codes at the granular a level. But I could be wrong!

cannikin avatar Sep 02 '21 17:09 cannikin

Same for GraphQL, I think the web don't care about HTTP code anymore 😅 image

simoncrypta avatar Sep 02 '21 18:09 simoncrypta

Helpful article explaining the problem with potential solutions. tl;dr: SEO can be hard https://thegray.company/blog/single-page-application-spas-404s-seo

thedavidprice avatar Sep 20 '21 20:09 thedavidprice

Related: If you have two users

{ id: 1, name: 'Bob' },
{ id: 2, name: 'Alice' },

And a visitor goes to https://page.com/users/3 Should you show them your users page with a message like "Sorry, we couldn't find the user you asked for" (and if so, with what status code?), or should you just give them your own general 404 page? Or should it even be handled on the web server level (as @cannikin suggested) and return a 404 from there? The article you linked @thedavidprice seems to also favor the last option.

Tobbe avatar Sep 21 '21 06:09 Tobbe

Note Remix touts proper 404s as a feature: https://remix.run/docs/en/v1/guides/not-found#how-to-send-a-404

dthyresson avatar Jan 20 '22 19:01 dthyresson

I still don't get how this works... index.html already returned a 200 from the server, React spun up, and THEN found that the resource you wanted wasn't found... it's too late to return a 404. They can throw a throw new Response("Not Found", { status: 404 }) all they want, but the Javascript that's executing that is already running in the browser via a FOUND page, that returned a 200!! HOW DOES THIS WORK

Unless this is all actually server magic and isn't happening in the browser?

cannikin avatar Jan 20 '22 19:01 cannikin

HOW DOES THIS WORK

Exactly. I have only had some success with Netlify redirects and even then not 100% on the magic.

But really do need it for SEO and such

dthyresson avatar Jan 20 '22 19:01 dthyresson

Exactly. I have only had some success with Netlify redirects and even then not 100% on the magic.

I think the only solution right now will come with static routing from the provider. We can probably do something to match the Redwood router for each deploy target to create the config?

simoncrypta avatar Jan 20 '22 19:01 simoncrypta

Agreed. I have questions for clarification as well. But seems like we have all the pieces in place to figure it out. We can (and should) solve this.

thedavidprice avatar Jan 20 '22 22:01 thedavidprice

Not an optimal solution, but I proved this behavior would work:

  1. Create a serverless function "notFound" that returns html to render the page and a 404 status
export const handler = async (_event: APIGatewayEvent, _context: Context) => {
  logger.info('Invoked notFound function')

  return {
    statusCode: 404,
    headers: {
      'Content-Type': 'text/html',
    },
    body: '<html><body><h1>Not Found</h1></body></html>',
  }
}
  1. In a custom not found page
      <MetaTags title="CustomNotFound" description="CustomNotFound page">
        <meta
          httpEquiv="refresh"
          content="0; url = http://localhost:8911/notFound"
        />
      </MetaTags>

redirect to the function

  1. Better (not not great) update the Framework NotFoundPage to do the redirect

This of course doesn't help when you customize since need to add the redirect logic etc -- and many other DX considerations make this less than optimal.

But, the idea of a redirect (from Router) to a serverless function can set an actual 404 status

dthyresson avatar Jan 21 '22 18:01 dthyresson

@dthyresson I have some questions about behavior, but overall I think "works but not optimal" is a great first step and this should be P1 for v1.

Thoughts on the next step here if not group discussion?

thedavidprice avatar Jan 21 '22 18:01 thedavidprice

Hmm, what happens when you hit the "Back" button on the 404 page? Does it go back to React, which loads the CustomNotFound page, and then redirects you to 404 again?

cannikin avatar Jan 21 '22 19:01 cannikin

I understand that for a human using a browser this probably appears to be the correct behavior: they try to go somewhere and it doesn't exist so they see a 404. But the HTTP status codes are for computers to figure out what's going on, and this isn't ideal for them...

In this case the page returns a 200, and then immediately redirects to a page that's a 404. How does that work with SEO? Will bots execute <meta> redirects and treat the result as if that was the response of the page they tried to go to in the first place? If not then this doesn't really help the problem. And may make it worse, as the bot catalogs the original URL as correct, since it returns a 200, but when a human tries to go there they see a "Not Found" page.

I guess we need to decide if we're trying to get this functionality in place for humans or for computers. Humans don't care about status codes, we could return a 200 all day and if the page just says "Not found" that's good enough for them. But if we're doing this for computers I think we need to make sure we have a "correct" solution for a computer, which is that the server itself returns a 404 when you try to go to a URL that doesn't exist—not that you eventually redirect to a URL that returns a 404. I don't think that's technically correct, but maybe I'm wrong!

But I definitely appreciate the ingenuity in getting this to work, well done @dthyresson!

cannikin avatar Jan 21 '22 19:01 cannikin

But if we're doing this for computers I think we need to make sure we have a "correct" solution for a computer, which is that the server itself returns a 404 when you try to go to a URL that doesn't exist

💯

thedavidprice avatar Jan 21 '22 23:01 thedavidprice

In this case the page returns a 200, and then immediately redirects to a page that's a 404.

Yup - hence "suboptimal".

But* I do think that:

But, the idea of a redirect (from Router) to a serverless function can set an actual 404 status

If the router instead of loading the not found page, did a redirect to the function, that may be an option to get a proper 404.

Many other issues like making it easy to add a custom not found page with custom markup.

dthyresson avatar Jan 22 '22 02:01 dthyresson

@thedavidprice linked a page about 404s earlier in this thread. That page links to another page where we can read this:

What Are the Most Common Improper Setups For 404 Pages? What Are The Most Common 404 Setup Mistakes/Errors?

  • Redirecting to a 404 page. This hurts everyone. Users are lost, search engines think everything is honky-dory when it's not, and since you don't know when it's happening - you can't fix it.
  • Automatically redirecting to the page you assume search engines and users want. A risky solution, that can easily go wrong (typically because they are sent to irrelevant content - like your homepage.) It's best not to assume. Find the issue and fix it.
  • Serving a 404 message on the page visually, but not delivering the corresponding 404 http response code. This hurts everyone (for the same reason as the "redirect to a 404 page" item above.)

We're currently doing no3 in that list. Redirecting to a function that returns the 404 would be what they have at the top of their "don't do this" list.

https://thegray.company/blog/how-to-make-a-404-page-for-seo-usability#what-are-common-404-setup-mistakes

AFAICT the only proper way to handle this is to mirror the router config on the server side and use that to determine if the page exists or not.

One interesting way to do that is to remove this config (for Netlify, other providers have something similar)

[[redirects]]
from = "/*"
status = 200
to = "/index.html"

And then, when building the RW app, you look at the router and generate a bunch of index.html files in directories matching the routes and put that on the web server.

One downside with that approach is that you wouldn't get a nice customized 404 page.

Tobbe avatar Jan 22 '22 16:01 Tobbe

Per Core Team Discussion:

  • bump this as a priority to post-v1; will be incorporated with SSR solution

thedavidprice avatar Feb 01 '22 19:02 thedavidprice

@cannikin

But if it redirects to a page that exists (the NotFoundPage) then by definition that's a 200!

I cannot quote the correct RFC from the top of my head, but i'm sure the definition goes differently :monocle_face: .
For example, this is what github does:

grafik

… and i would want any app i build to return behave in the same RESTful manner.

Philzen avatar Jul 10 '22 19:07 Philzen

Right, and that’s a web app acting in the correct manner: that URL doesn’t exist and so the web server returns 404 with a friendly UI. Our issue arises because of React, where ALL URLs are assumed to exist as far as the web server is concerned. It’s only React that knows, via the router, if that URL is correct. But the 200 has already come down from the server. It’s too late to return a 404.

So we were taking about having React redirect you to a URL that React doesn’t own, and that can always return a 404, but I don’t think that’s “correct” from an HTTP standpoint. But we haven’t got any updates in a while so I’m not sure where this stands…

cannikin avatar Jul 10 '22 21:07 cannikin

So we were taking about having React redirect you to a URL that React doesn’t own, and that can always return a 404

I don't like that solution. It's a documented bad way of doing it. See below.

What Are The Most Common 404 Mistakes? Mistakes and improper setups with 404 pages are not uncommon. Here are some of the most prevalent mishaps involving 404 setups.

  • Redirecting to a 404 page. This hurts everyone. Users are lost, search engines think everything is honky-dory when it's not, and since you don't know when it's happening - you can't fix it.

- https://thegray.company/blog/how-to-make-a-404-page-for-seo-usability#what-are-common-404-setup-mistakes

I'm hoping we can come up with a better solution when we move to Vite+SSR. @dac09 What do you think?

Tobbe avatar Aug 08 '22 06:08 Tobbe