rushstack icon indicating copy to clipboard operation
rushstack copied to clipboard

[rush] ci caching

Open ksjogo opened this issue 4 years ago • 15 comments

Is this a feature or a bug?

  • [x] Feature
  • [ ] Bug

Please describe the actual behavior.

Is there some documentated way in which the rush install can be cached on CI runners? In the concrete case I am running a rushstack repo on Github and deploy via Github actions but would like to speed up the builds. Dependencies stay stable most of the time and only static content will be updated, so some caching of all dependencies would be great. There is some documentation for lerna https://github.com/actions/cache/blob/master/examples.md#node---lerna which shows that multiple folders can be cached, but which folders would be these for rush and would it work with symlinks and pnpm? I will investigate, but maybe some already can share that knowledge.

What is the expected behavior?

A faster CI build process.

ksjogo avatar Apr 28 '20 00:04 ksjogo

That's a great idea. For the rush/pnpm combo, the store path defaults to common/temp/pnpm-store. This is the folder to cache, if you want to try.

sachinjoseph avatar Apr 28 '20 05:04 sachinjoseph

So we're actually doing this in our project by caching the contents of common/temp/pnpm-store. It somewhat works in that pnpm will reuse most of the cached dependencies, but still somehow downloads 80-some packages out of the 3k or so that we use each time we install on the CI (even when no changes to pnpm-lock.yaml are made). Not sure what that's about

gregjacobs avatar Sep 17 '20 19:09 gregjacobs

Btw, it might be nice for Rush to have a CLI / API to collect, and later restore, the cached dependencies (which should work regardless of package manager)

And as a second point of interest, I implemented something for our Rush repo to collect all of the build artifacts that result from rush build so that I can restore them on the next CI execution, effectively implementing a "build cache" of sorts. The subsequent CI executions of 'rush build' will only rebuild packages that have changed since the last one, and skip the build of packages that haven't. Would like to see this as a first class citizen in Rush as well (it currently collects build artifacts by assuming that anything that is .gitignore'd in a package is an artifact)

gregjacobs avatar Sep 17 '20 19:09 gregjacobs

Something else to keep in mind is that as a monorepo gets larger and larger it might not be desirable to cache all the dependencies for all the packages, and instead only cache the dependencies of the packages that changed. This is a bit trickier to do with just the CI tool's caching mechanism and would probably need Rush to have some sort of cache implementation like Bazel or what's suggested in #1156, but it's something to keep in mind. In our large monorepo a large amount of time is spent uploading the huge cache of node_modules, like 20 minutes almost.

migueloller avatar Sep 18 '20 00:09 migueloller

but still somehow downloads 80-some packages out of the 3k or so that we use each time we install on the CI (even when no changes to pnpm-lock.yaml are made). Not sure what that's about

@gregjacobs When PNPM installs Rush's common/temp/*.tgz tarballs, it reports that as "downloading" even though no actual network activity is involved. And they are not cached, because PNPM needs to decompress the tarball contents in order to determine if the package.json file has changed or not. So those "downloads" will be reported every time. Maybe this is what you are seeing?

octogonz avatar Sep 18 '20 04:09 octogonz

This is a bit trickier to do with just the CI tool's caching mechanism and would probably need Rush to have some sort of cache implementation like Bazel or what's suggested in #1156,

Rush is already integrated with BuildXL which has similar features as Bazel. BuildXL is open source, but I don't know of any groups outside Microsoft setting it up. It would be possible to integrate Rush with Bazel in the same way, and it might be interesting to support both options. This would allow Rush to provide 3 different stories:

  1. For everyday users, rush build works in a simple familiar way. And the only required prerequisites are some OS that can run Node.js -- super easy!
  2. For people whose monorepo grows to a very large scale like at Microsoft, there is an "enterprise" option where you set up a specialized lab with BuildXL or Bazel. It is complex and not for a casual user. It requires special native prerequisites and specific OS images. But if you have a massive monorepo, your lab staff isn't intimidated by that sort of thing.
  3. BUT even if you move to the enterprise model, an everyday engineer can still clone the monorepo on his laptop and run rush build in the old way. It might take an hour to build everything, but fundamentally the monorepo layout is still familiar and understandable. The enterprise integration didn't sacrifice that part.

From what I hear this is already pretty much the Rush+BuildXL story at Microsoft -- someone just needs to go make it easier for an external group to set up. With my company's current trajectory, I might be tackling that myself by next year heheh.

BTW we also considered the idea of making a lightweight 100% Node.js distributed build engine with similar capabilities as BuildXL, but a much simpler model. It wouldn't be quite as efficient or scalable, but it would allow a broader audience to benefit from these optimizations. I'd be excited about that as well. A couple people had some promising prototypes, although I'm slightly concerned that making it complete and professional could be a very expensive undertaking. (For example I remember thinking that Rush's job was pretty simple -- we can make it in like a couple months! 5 years and 2000 PRs later, Rush still has tons of work to do haha.)

but it's something to keep in mind. In our large monorepo a large amount of time is spent uploading the huge cache of node_modules, like 20 minutes almost.

Is this for deployments? If so you might look at the new rush deploy feature.

octogonz avatar Sep 18 '20 04:09 octogonz

@octogonz, thanks for the detailed response. The main bottleneck is actually the installation of NPM dependencies for doing things like running tests. So it's really caching of rush install what I'm talking about, as opposed to rush build.

Rush could provide something like Yarn's focused installs so that you can run rush install --package app1 and instead of installing all NPM dependencies for the entire monorepo it just installs the transitive dependencies of app1. Then, one could split up CI builds by package, use the package name as a cache key, and the size of each individual cache should be much more manageable as it doesn't include the entire monorepo's node_modules, only the subset that's needed for app1.

There's other issues with this, though. Since there is a single shrinkwrap file as the top level, any changes for any dependency in the entire monorepo would result in the cache being invalidated for app1 (even if the changes didn't affect app1). There's ongoing conversation on having the ability to create a lockfile per package to solve this: https://github.com/yarnpkg/yarn/issues/5428

rush deploy gets pretty close to solving this because you can isolate a single deployment's dependencies and you could potentially cache that. The issue is that it's targeted towards deployment so that cache won't be enough for the next CI run's call of rush install. An alternative, of course, is just to avoid caching node_modules. In that case, though, as the monorepo dependencies grow, rush install gets slower and slower and if rush install is called in multiple jobs in CI it adds up. That could be mitigated by checking in packages into version control, similar to Yarn's offline support but I don't know if this is something every team will want to do.

So yeah, the main problem I'm trying to solve here is: as the monorepo grows, how do you efficiently run tasks for specific packages without having to bring all third-party dependencies of the entire monorepo?

migueloller avatar Sep 18 '20 15:09 migueloller

When PNPM installs Rush's common/temp/*.tgz tarballs, it reports that as "downloading" even though no actual network activity is involved. And they are not cached, because PNPM needs to decompress the tarball contents in order to determine if the package.json file has changed or not. So those "downloads" will be reported every time. Maybe this is what you are seeing?

@octogonz Hey Pete, you're right! The "downloads" are the exact number of packages we have in rush.json. Thanks for clearing that up!

And also, thanks for your thoughtful and complete responses, as always :)

Will come back with more on this thread about the dependency caching. As an aside though, I would definitely like to have distributed build support (and the idea of a Node-only solution sounds much better than trying to get images running, esp. in my locked-down corporate environment). And I would also like smarter re-builds in terms of say, if a repo dependency was updated but package A wasn't affected by that dependency, then package A shouldn't be rebuilt. Where might be best to comment on something that? Is there an existing issue? (I'll comment on https://github.com/microsoft/rushstack/issues/1156 for distributed build support)

Best, Greg

gregjacobs avatar Sep 19 '20 17:09 gregjacobs

Thanks for this thread, we reduced our rush install time from ~56s to ~34s -- however is there any further caching that could reduce the 34s? What work is being done here if the dependencies are already downloaded? (e.g. would caching each packages node_modules symlink folders help?)

JamesBurnside avatar Jun 24 '21 00:06 JamesBurnside

For my setup, I first tried to use GitLab S3 cache to cache rush installs. But it turned out that it takes more time than just repeat rush installs in each job. The reason is CPU and download/upload time to get to zip all dependencies and upload/download them.

TheBit avatar Nov 03 '21 11:11 TheBit

Here’s our settings for caching dependencies in a Rush monorepo.

      - name: Cache Rush
        uses: actions/cache@v2
        with:
          path: |
            common/temp/install-run
            ~/.rush
          key: ${{ runner.os }}-${{ hashFiles('rush.json') }}
      - name: Cache pnpm
        uses: actions/cache@v2
        with:
          path: |
            common/temp/pnpm-store
            ~/.cache/Cypress
          key: ${{ runner.os }}-${{ hashFiles('**/pnpm-lock.yaml') }}
  • When running node scripts/common/scripts/install-run-rush.js, the script stores the Rush installation at common/temp/install-run, so we cache that.
  • When running $RUSH install:
    • pnpm is installed at ~/.rush
    • Project deps is downloaded to common/temp/pnpm-store

We observed time saving of 22 seconds in one small project:

  • No cache: 37s
    • Install: 37s
  • Cold cache: 53s
    • Install: 37s
    • Persist: 16s
  • Warm cache: 15s
    • Restore: 10s
    • Install: 5s

dtinth avatar Dec 14 '21 17:12 dtinth

@dtinth do you have a sample project that has the entire file? I still can't understand how to implement this properly.

darklight9811 avatar Dec 31 '21 22:12 darklight9811

Thanks for sharing @dtinth! Seems to work well for me.

I removed ~/.cache/Cypress and specified the lock file as common/config/rush/pnpm-lock.yaml.

Also, I haven't used actions/cache before. Make sure to do the actions/checkout before, so hashFiles can be calculated correctly.

      - uses: actions/checkout@v2
      - name: Cache Rush
        uses: actions/cache@v2
        with:
          path: |
            common/temp/install-run
            ~/.rush
          key: ${{ runner.os }}-${{ hashFiles('rush.json') }}
      - name: Cache pnpm
        uses: actions/cache@v2
        with:
          path: |
            common/temp/pnpm-store
          key: ${{ runner.os }}-${{ hashFiles('common/config/rush/pnpm-lock.yaml') }}

jonasb avatar Feb 06 '22 18:02 jonasb

@jonasb Hey, thanks for sharing. Why are you hashing rush.json for the cache key? Does it ever get invalidated? Shouldn't we hash something like repo-state.json? Sorry, I'm a bit new at this 😅

renan-britz-mgm avatar Sep 02 '22 18:09 renan-britz-mgm

@renan-britz-mgm if 2 different rush versions can create the same repo-state.json which may or may not be possible (@octogonz would know) then you might be re-installing rush on every ci run. AFAIK actions/cache won't update a cache if it gets a hit.

jessekrubin avatar Sep 08 '22 19:09 jessekrubin

I've created a GitHub action to simplify the process. https://github.com/marketplace/actions/rush-cache

- name: Restore cache
  uses: gigara/rush-cache@v1

Also, you can use the GitHub cache plugin in the rush project to save the build cache in GitHub. https://www.npmjs.com/package/@gigara/rush-github-action-build-cache-plugin

gigara avatar Mar 03 '23 16:03 gigara