turbo icon indicating copy to clipboard operation
turbo copied to clipboard

Cache keeps growing indefinitely

Open 01walid opened this issue 2 years ago • 10 comments

What version of Turborepo are you using?

1.1.6

What package manager are you using / does the bug impact?

pnpm

What operating system are you using?

Linux

Describe the Bug

Using turborepo in a relatively big and active monorepo. Saving and restoring turborepo's cache in CI is quickly becoming time-consuming. The cache grows to some hundreds of MBs in a matter of few days/weeks.

Expected Behavior

turborepo should cleanup old cache objects, either automatically or via a given flag (i.e. --refresh-cache). So the cache wouldn't grow unbounded.

To Reproduce

Just keep using turborepo saving/restoring the same cache over few days, given an active repo. Notice how the cache size grows.

01walid avatar Mar 11 '22 03:03 01walid

We've started noticing this too, restoring and saving an updated cache of around 1.1 GB easily takes over a minute on GitHub Actions. Alternatively, a CLI command to evict items older than x would be desirable, to keep cache sizes manageable.

attila avatar May 12 '22 17:05 attila

Any news here? We are unfortunately experiencing the same issue and we can't use Vercels caching options due to internal guidelines...

florianmatz avatar Dec 10 '22 12:12 florianmatz

If you are using GitHub Actions, I’m building this action to solve this problem.

https://github.com/dtinth/setup-github-actions-caching-for-turbo

Instead of reading/writing cache from the filesystem and using a separate step (e.g. actions/cache) to save/restore this filesystem state, this action configures Turborepo to read from/write to GitHub Actions Cache Service API. This allows for fine-grained caching and avoids the problem where cache grows indifinitely.

If you are not using GitHub Actions, you deploy some open-source solution to your infrastructure:

  • https://github.com/ducktors/turborepo-remote-cache (uses local filesystem / S3 / GCS / Azure Blob)
  • https://github.com/cometkim/turbocache (serverless solution that uses Cloudflare Workers and KV)

dtinth avatar Feb 05 '23 17:02 dtinth

What about local development? We have people here getting 300GB of cache on their local turbo repo cache

crubier avatar May 27 '23 15:05 crubier

My current local folder ( I only checked out the code fresh 2 weeks ago) - this is madness !

15G	./node_modules/.cache/turbo

davecarlson avatar Aug 12 '23 07:08 davecarlson

We have same problem on CI. Getting and restoring cache was continuously getting bigger and CI runs were terribly getting slower. find ./node_modules/.cache/turbo -mtime +7 -exec rm {} + Temporarily, I run script that gets files older than one week and remove it.

Kanary159357 avatar Sep 11 '23 07:09 Kanary159357

I've also had this concern while using Turbo's caching. My temporary solution to this is using a modified stackoverflow answer for deleting files based on creation time in Javascript.

Command

node delete-old.mjs node_modules/.cache/turbo 604800000

604800000 = 7 days in ms

Script delete-old.mjs

// Modified from https://stackoverflow.com/a/23022459
import fs from 'fs';
import { fileURLToPath } from 'url';
import path from 'path';
import { rimraf } from 'rimraf';

const __filename = fileURLToPath(import.meta.url); // get the resolved path to the file
const __dirname = path.dirname(__filename); // get the name of the directory

// e.g. node delete-old.mjs directory
const directory = path.join(__dirname, '..', process.argv[2]);

// e.g. node delete-old.mjs directory 604800000
/**
 * Expiry time in milliseconds.
 */
const expiryTime = Number(process.argv[3]) || 604800000;

const dateFormatOptions = {
  weekday: 'long',
  year: 'numeric',
  month: 'long',
  day: 'numeric',
  hour: 'numeric',
  minute: 'numeric',
};

fs.readdir(directory, (err, files) => {
  console.log(`Checking for files older than ${getColouredText(expiryTime + ' ms', 31)}...\n`);

  if (err) {
    return console.error(err);
  }

  if (!files.length) {
    console.log(getColouredText(`Directory empty.`));
  }

  files.forEach((file, index) => {
    fs.stat(path.join(directory, file), (err, stat) => {
      let endTime, now;

      if (err) {
        return console.error(err);
      }

      now = new Date().getTime();
      endTime = new Date(stat.ctime).getTime() + expiryTime;

      if (now > endTime) {
        return rimraf(path.join(directory, file))
          .then(() => {
            console.log(`Successfully deleted expired file ${getColouredText(file, 32)}`);
            console.log(
              `Created Date: ${getColouredText(stat.ctime.toLocaleString('en-CA', dateFormatOptions), 33)}\n`
            );
          })
          .catch(err => {
            console.error(err);
          });
      }
    });
  });
});

/**
 * Available colours: https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
 *
 * @param {*} text
 * @param {*} colourCode
 * @returns
 */
function getColouredText(text, colourCode = 33) {
  return `\x1b[${colourCode}m${text}\x1b[0m`;
}

RazeiXello avatar Feb 12 '24 17:02 RazeiXello

If you want to keep an exact amount of caches, you can use my example:

export CACHES_TO_KEEP=2
ls -At -1 -d "$PWD/node_modules/.cache/turbo/"* | tail -n "+$(($CACHES_TO_KEEP*2+1))" | xargs -r rm

Explanation:

  • CACHES_TO_KEEP is the amount of caches that should remain
  • ls -At -d "$PWD/node_modules/.cache/turbo/"* lists all cache files (absolute paths) sorted by time
  • tail -n "+$(($CACHES_TO_KEEP*2+1))": removes lines of the most recent caches
    • *2: it is multiplied by 2, as one cache always has 2 files (*.tar.zst and *-meta.json)
    • +1: is just an offset that is needed for tail
  • xargs rm actually removes the files

I hope that this will be implemented soon, to avoid these workarounds.

GitHub action
# .github/workflows/actions/remove-outdated-turbo-cache/action.yml`

name: Remove outdated turbo cache
description: "Removes outdated caches. This workaround can be removed, when this issue is resolved: https://github.com/vercel/turbo/issues/863"

inputs:
  caches-to-keep:
    description: "Keeps the defined amount of the most recent caches (default: 10). All other caches will be removed"
    default: "10"


runs:
  using: "composite"
  steps:
    - name: Remove old turbo cache
      shell: bash
      run: |
        outdated_files=$(ls -At -1 -d "$PWD/node_modules/.cache/turbo/"*)
        outdated_files_amount=$(echo "$outdated_files" | wc -l | awk '{$1=$1};1')
        outdated_files_size=$(du -ch $outdated_files | tail -1 | cut -f 1)
        echo "Removing $outdated_files_amount outdated cache files ($outdated_files_size):"
        echo "$outdated_files"
        echo "$outdated_files" | tail -n "+$((${{ inputs.caches-to-keep }}*2+1))" | xargs -r rm
# Usage as a workflow step:
# ...
      - name: Remove outdated turbo cache
        uses: ./.github/workflows/actions/remove-outdated-turbo-cache
        with:
          caches-to-keep: 25
# ...

mstuercke avatar May 31 '24 09:05 mstuercke