lambda-packages icon indicating copy to clipboard operation
lambda-packages copied to clipboard

[RFC] Add caching to @astrojs/image

Open altano opened this issue 3 years ago • 1 comments

THIS IS ONLY AN RFC meant to spark discussion. It is a work in progress and is not ready to be merged (ergo "draft" status of the PR). The accompanying RFC discussion can be found here.

Problem being solved

My personal site has a bunch of animated gifs that sharp chokes on. As a result, the @astrojs/image phase of the build takes ~40 seconds. It takes that long on EVERY build and loading the page in dev/ssr takes extremely long as well. If sharp (and other transforms) can be slow, we can benefit greatly from caching.

Changes

  1. This introduces a new package, @astrojs/fs, that will hold file system abstractions.
  2. The first abstraction added to @astrojs/fs is transformBuffer which grants free caching to any code that uses it.
  3. @astrojs/image has been modified to use transformBuffer.

More details on transformBuffer

API

The API is pretty simple:

const { output, metadata } = await transformBuffer<TCachedImageMetadata>({
	input: myBuffer,
	transformMetadata: {...},
	transformFn: async () => {
		return mySuperExpensiveTransform(myBuffer, ...);
	},
	enableCache: true,
});

Implementation

If the cache is disabled this is currently mostly a pass-through: we call transformFn and return the result. If the cache is enabled, things get more interesting. We first generate a unique cacheKey and then check the file system cache if it is present (the file system cache is using the cacache package which is the same caching store that npm itself uses, so we get a lot for free). If the transformed version of the buffer is cached, we can skip calling transformFn entirely and we just return the transformed buffer.

More notes

  • transformBuffer can be used for any transformation in any integration: not just image transforms. It's made to be general purpose for any transform operations.
  • The cache key includes a hash of the input buffer itself, so if either the contents of the buffer OR the transform parameters changes, we will perform the transform again. If they are both identical we used the cached value. This is lightning fast because it uses xxhash-wasm, the fastest non-cryptographic hash I could find.
  • Whether the data is cached or not, we always write the file to disk by copying the cached copy. The only thing being skipped in the cache-hit path is the transform. We could explore hard-linking the cached copy instead in the future to squeeze out even more performance.
  • SSG/Dev/SSR all share the same cache. In fact, all users of @astrojs/fs share the same cache. This is perfectly safe: if two separate integrations are transforming the same input buffer with the same exact transformation, why shouldn't they use the same cache? If the transformations are different, they should provide different transformMetadata objects to have different caches. We can partition the cache by integration if we decide.
  • The cache currently grows unbounded. It should be limited by total size and be an LRU cache.
  • There are no user-facing APIs for clearing the cache in this WIP. They should be added.

Testing

  1. In astro (this repo): pnpm test
  2. In my personal project:
# Grab the updated code
cd ~/src/astro && pnpm build
cd ~/src/astro/packages/integrations/image/
pnpm pack
cd ~/src/astro/packages/fs/
pnpm pack
cd ~/src/<my-personal-project>/
npm i ~/src/astro/packages/fs/astrojs-fs-0.0.1.tgz
npm i ~/src/astro/packages/integrations/image/astrojs-image-0.7.0.tgz
# Test that the cache works
npm run build # observe slow build time (~40 seconds spent in `@astrojs/image`)
npm run build # observe nearly instance build (~50 **ms** spent in `@astrojs/image`)
npm run dev # observe that page loads instantly, images are pulled from cache in ssr/dev
FIRST Build SECOND Build
image image

🔥 NOTE: On my personal site, this brings all subsequent builds down from ~40s to <1s. The animated gifs that sharp chokes on get cached after one slow build and, from that point forward, the cached versions are used.

Docs

This would require documentation for integration authors, for sure, but we're not there yet.

altano avatar Sep 15 '22 03:09 altano