readthedocs.org icon indicating copy to clipboard operation
readthedocs.org copied to clipboard

Redirects: simplify redirect logic / stop using `resolve_path`

Open stsewd opened this issue 1 year ago • 2 comments

Currently, we try to match all redirects in all requests, but this is not always necessary, given that some redirects are valid just in some contexts.

We are also making use of the resolver to build the final redirect, but this is not always necessary, in most cases we can just add or replace components of the original URL. Calling the resolver makes several queries to the DB, and requires having each component of the URL parsed.

Prefix redirects

Used when migrating from another site to RTD, we redirect all the URLs under a given prefix to the default version of the project.

This redirect is valid only when we fail to find a version, since if the user is already in /en/latest/ it doesn't make sense to redirect.

To generate the final URL, we need to use the resolver, using the default version/language.

We currently use this redirect even if we find a version, for example, a /foo/ prefix redirect will redirect:

  • /foo/index.html -> /en/latest/index.html
  • /en/latest/foo/index.html -> /en/latest/index.html (this one is wrong!)

Page redirects

Used when a page is moved or deleted.

This redirect is valid only when we are able to find a version, since if we don't find a version we will redirect to another 404.

To generate the final URL, we don't need to use the resolver, we can just replace the path of the original URL.

We currently use this redirect even if we fail to find a version, for example a /foo.html -> /bar.html page redirect will redirect:

  • /en/latest/foo.html -> /en/latest/bar.html
  • /en/not-found/foo.html -> /en/not-found/bar.html (this one will 404!)

That second case may look like expected behavior, but it isn't (or at least isn't useful), since the final URL will 404. If a user deleted a whole version, they should use an exact redirect instead, for example:

  • /foo.html -> /bar.html (page redirect)
  • /en/not-found/$rest -> /en/latest/ (exact redirect)

This way, /en/not-found/foo.html will redirect to /en/latest/foo.html, and that will redirect to /en/latest/bar.html.

Exact redirects

Used to redirect a whole path to another path. This redirect is valid in all cases.

To generate the final URL, we don't need to use the resolver, we just use the to URL and replace the $rest part if it exists.

HTML and HTML Dir redirects

Used when a project has changed from using .html URLs to dir (/) URLs or vice versa.

We could restrict this redirect to only apply when we are able to find a version, since if we don't find a version we will redirect to another 404, but it shouldn't be a problem to apply it in all cases.

To generate the final URL, we don't need to use the resolver, we just replace the extension of the original URL, this is foo.html to foo/, foo/ to foo.html, and foo/index.html to foo.html.

Note: this redirects are named "Sphinx redirects", but they apply to all tools, not just Sphinx.

Forced redirects

All redirects need to be checked when using forced redirects. An exception can be made for prefix redirects, since they will only make sense for single version projects (versioned projects will 404 if an unknown path is given).

Changes

The easiest change will be to stop using the resolver for page and html/html dir redirects. The other change requires changing the queryset, to filter by the valid redirects for the given request, this may reduce the complexity of the final query resulting in a faster query (haven't tested this, so this is just an assumption), but if a project has lots of redirects, we may see a small improvement.

stsewd avatar Mar 06 '23 19:03 stsewd

@stsewd was this achieved in the latest redirects refactor?

humitos avatar Jan 16 '24 11:01 humitos

@stsewd was this achieved in the latest redirects refactor?

Nope

stsewd avatar Jan 16 '24 17:01 stsewd