blog icon indicating copy to clipboard operation
blog copied to clipboard

Add Sanity content migration script

Open devinhalladay opened this issue 10 months ago • 0 comments

As part of the effort to move our blog to Sanity, we need to migrate a bunch of old content. We can do it manually by copy/pasting text content into Sanity and manually uploading images, but automating this will save time and money.

This PR adds a migration script via a new node-ts app in migration. The script converts all markdown posts to HTML, then deserializes that HTML into the portable text structure Sanity requires for import via newline-delimited JSON. In the resulting ndjson file, every line represents a document for Sanity to import.

Assets, such as images and videos, and rich content blocks, such as embeds and code blocks, are all supported by this script. This is done by hooking into the HTML deserialization process to generate sanity blocks for a given HTML tag. Image and video uploads happen via the Sanity CLI, with two caveats:

  1. Broken asset URLs will fail to import. If a post already has bad asset links, the post may fail and need to be manually fixed in Sanity.
  2. Asset URLs in the .ndjson file must be prefixed so Sanity knows how to upload them. Images get an image@url prefix, whereas videos and all other filetypes get a file@url prefix.

The process for running the migration is simple:

  1. Install Sanity CLI and run sanity login. You need to be a sanity admin to auth.
  2. For all posts you want to migrate, copy them into migration/posts
  3. Run npm run migrate which will parse all markdown files, spit out HTML for each, and then generate the final ndjson import file.
  4. Run npm run import which will trigger Sanity to import our posts and upload all linked assets. Depending on how many assets need to be uploaded, this step can take a while.
  5. If there are no blocking errors that cause the script to fail, you may see a list of assets that failed to upload. Take note, as these will need to be manually fixed by editing the post in the Sanity CMS.

Currently, all migrated posts go into a migration dataset in Sanity's content lake. When we're ready, we can easily move this content to the prod dataset via Sanity CLI.

This node app also includes a convenience script, npm run down, which will delete the migration dataset in the case that we need to start another migration from scratch.

devinhalladay avatar Oct 03 '23 20:10 devinhalladay