owid-grapher icon indicating copy to clipboard operation
owid-grapher copied to clipboard

Salvage good ideas from redirects PR and improve redirects handling in Grapher

Open danyx23 opened this issue 2 years ago • 2 comments

Note

The issue below is from late 2022 - as part of the Wordpress migration of early 2024 we pulled in the most important part of the redirects which makes this issue less urgent. There are still some good ideas in the linked PR but the issue below has to be reworded.

Problem

Redirects are a central piece for making our content graph correct. A lot of our content interlinks not by using internal ids but by referencing via links/slugs - e.g. in posts charts are embedded as iframes with links to the chart at the time of post authoring. If the slug of a grapher chart is later changed and a redirect created then most current regex based lookups designed to show back-references from charts to posts fail to surface posts using the original slug.

For this reason it is important to have a full understanding of all our redirects at the database level.

How our redirects work currently

Redirects for the OWID website are assembled during baking from 5 sources to created a _redirects file that Netlify uses to perform redirects at the edge.

The 5 sources of redirects are:

  • Manual bulk redirects that redirect some legacy links with wildcards to new paths - e.g. if something still points to /wp-content/uploads/* it is rewritten to point to /uploads/*.
  • Country detection redirects. This is used for the Global Entity Control. We set up redirects that make use of a proprietary Netlify IP lookup feature that redirects with a ISO Alpha 2 country code in the query param for the inferred country. These are assembled from a hardcoded list of countries with ISO Alpha 3 and 2 codes.
  • Redirects from wordpress - the Wordpress admin UI allows setting up arbitrary slug redirects. This is a list manually curated in Wordpress of currently 373 redirects. Most of these are used to redirect old names of posts to new names. Some of them are wrong (the target does not resolve to anything anymore)
  • Chart redirects - these already live in the Grapher Database in the chart_slug_redirects table and, uniquely, redirect slugs not to slugs but to chart ids.
  • Explorer redirects - these redirect to explorers and rewrite to target explorers with sometimes certain query string params set

One additional complication is that currently redirects are allowed to form chains and, by extension, cycles.

Proposed new approach

Redirects should be completely stored in the database. The bake function for redirects should then simply take the content from the database and write the netlify file doing at most minor formatting changes.

Chains should be forbidden. The target of a redirect must not be the source of another one. This eliminates also the need to detect cycles - this should be enforced as a database constraint. Chains will need to be resolved but this is simply a matter of pointing every segment in the current chain to the eventual end of the chain:

flowchart TB
    subgraph after
    A2["A"] --> C2["C"]
    B2["B"] --> C2["C"]
    end
    subgraph before
    A1["A"] --> B1["B"]
    B1["B"] --> C1["C"]
    end
    

We should have one table per source (explorers are still a bit unclear - it looks like these sometimes modify grapher charts and maybe they have to stay a function for the query string manipulation - but the main target resolution should be done declaratively in the database)

In some scenarios with our links we care only about the path and not about the query params - but in same cases we do care about the query params. E.g. if you want to enumerate the entry points into a chart it makes a difference if the query string changes the country selection. Therefore target paths should be split always into domain, path and query string. Computed columns could be used as a convenience feature to re-assemble full slugs/urls.

Both in source and target parts of the redirect tables urls should be streamlined to never contain trailing slashes - this is how Netlify treats urls in redirects.

ER Diagram of proposed DB additions

erDiagram
  WordpressRedirects ||--o| CompleteRedirects : "merged into"
  ChartSlugRedirects ||--o| CompleteRedirects : "merged into"
  ManualBulkRedirects ||--o| CompleteRedirects : "merged into"
  ManualBulkRedirects {
        int Id
        string Slug
        string TargetDomain
        string TargetPath
        string TargetQueryString
        int StatusCode
    }
  WordpressRedirects {
        int Id
        int WordpressId
        string Slug
        string TargetDomain
        string TargetPath
        string TargetQueryString
        int StatusCode
    }
  ChartSlugRedirects {
        int Id
        string Slug
        int ChartId
    }
  CompleteRedirects{
        int Id
        enum SourceKind
        string From
        string TargetDomain
        string TargetPath
        string TargetQueryString
        int StatusCode
}

danyx23 avatar Oct 03 '22 19:10 danyx23

This issue has had no activity within 10 months. It is considered stale and will be closed in 7 days unless it is worked on or tagged as pinned.

github-actions[bot] avatar Aug 21 '23 07:08 github-actions[bot]

We'll need to finish the work started in #1718 so that we can resolve all content correctly in our DB

danyx23 avatar Aug 21 '23 11:08 danyx23