rollup icon indicating copy to clipboard operation
rollup copied to clipboard

Persist cache to disk?

Open Rich-Harris opened this issue 7 years ago • 32 comments

This Twitter thread got me thinking — perhaps it would make sense for Rollup to (optionally?) write its cache to disk for faster cold builds. At the moment the cache is only used (at least by the CLI) when running in --watch mode, by keeping the results in memory.

After all, if Parcel are going to use our ideas it's only fair we borrow some of theirs as well 😀

Rich-Harris avatar May 11 '18 15:05 Rich-Harris

For a project I'm working on we use rollup, and we memoize all calls to rollup - as well as every call to our transform plugins (mostly babel), and the memoization is seralisable so we serialise it to disk. While we found caching the output of rollup (keyed by the input and file checksums) to be very useful, we also found a huge benefit to caching the output of transforms (keyed by input and file checksums too). If you'd like more details I'm happy to share.

keithamus avatar May 11 '18 15:05 keithamus

Interesting — the cache works on the assumption that identical input plus identical set of plugins results in identical output, so Rollup doesn't bother checking the output from transform A before piping it into transform B (unless that changed in my absence, ha!). Are you saying there's a benefit to doing so?

(Side-note: the assumption that the configuration hasn't changed is important here; there would need to be some way of checking that it was in fact unchanged, if work was cached between runs.)

Rich-Harris avatar May 11 '18 15:05 Rich-Harris

We were unable to get Rollup's existing cache to persist to disk, so we rolled our own which works over the top of rollup. For our own cache system - where we have a build of say a hundred files - if one file changes out of that 100 we don't want to have to run babel for 100 files again, so we simply cache the babel transform plugin, passing the cache-wrapped transform to rollup (e.g. plugins: [cache(babel()) ]). This way just the babel transform gets cached, which means even if rollup caches nothing, we still save time (our benchmarks show significant time) by caching these transforms.

keithamus avatar May 11 '18 16:05 keithamus

I think transforms are definitely the best target for caching. But maybe we should consider adding some way of transforms to signal that they are "pure". As the most important plugins are part of the rollup organization, it would be easy to add such a flag to them.

Then ideally if all transforms are pure and we now the original file has not changed, we could cache the result of applying all transforms + parsing as a stringified object. Not sure if file-size is a problem, might make sense checking if some form of compression could even improve the speed here.

The actual "rolling-up" is hard to cache but it's easy to hydrate the process with an existing AST.

I guess @guybedford has an opinion on this as well, definitely sounds like something we should do. As for watch mode, there is also still a bit of potential for speed improvement left. Currently, we re-create our AST from acorn's result on each run while the architecture would already support resetting and reusing the existing internal data structures.

lukastaegert avatar May 11 '18 17:05 lukastaegert

Agreed we should definitely move towards doing this. First under an option, then hopefully we can provide it by default for 1.0.

What do people think about the default cache folder? .cache/rollup? What are the current trends here?

guybedford avatar Jul 31 '18 14:07 guybedford

node_modules/.cache/rollup was introduced by sindresorhus. Looks good and not requires to add it to gitignore.

TrySound avatar Jul 31 '18 14:07 TrySound

I would generally give a 👎 for node_modules/.cache/rollup as its likely that most CIs will throw away node_modules for each build. Some libraries use npm's cache folder (typically ~/.npm), which can be retrieved by looking at process.env['npm_config_cache'] - I'd be happy if we checked that and perhaps defaulted it to ./node_modules/.cache (i.e. cacheDir = path.resolve(`${process.env['npm_config_cache'] || './node_modules/.cache'}/rollup`))

keithamus avatar Aug 01 '18 11:08 keithamus

I am currently using rollup via it's API, and all my inputs are held in the node process in memory (piped in from .NET over http). I use https://github.com/Permutatrix/rollup-plugin-hypothetical for the in memory file store.

From reading this, if rollup is going to be creating its own persistent cache, i'd like to be in control of the persistence myself rather than rollup writing directly to the file system somewhere. In my case I'd likely shuffle the cache back over to the .NET process, for persistence somewhere within the asp.net core website itself. Implied in that, is the ability to prime the cache myself, in the case of a cold start, rather than rollup looking at the filesystem directly. Essentially I'd like the read and write cache method to be pluggable

dazinator avatar Nov 02 '18 12:11 dazinator

Hey folks. This is a saved-form message, but rest assured we mean every word. The Rollup team is attempting to clean up the Issues backlog in the hopes that the active and still-needed, still-relevant issues bubble up to the surface. With that, we're closing issues that have been open for an eon or two, and have gone stale like pirate hard-tack without activity.

We really appreciate the folks have taken the time to open and comment on this issue. Please don't confuse this closure with us not caring or dismissing your issue, feature request, discussion, or report. The issue will still be here, just in a closed state. If the issue pertains to a bug, please re-test for the bug on the latest version of Rollup and if present, please tag @shellscape and request a re-open, and we'll be happy to oblige.

shellscape avatar Aug 09 '19 18:08 shellscape

I still think a CLI default experience here would be useful, and I even started work along these lines in https://github.com/rollup/rollup/pull/2397 before getting pulled in other directions.

Happy to let new contributions pick this one up though.

guybedford avatar Aug 10 '19 06:08 guybedford

My current ambitions, should time permit, would be to work towards a plugin interface for “cache providers” with the possibility that the Rollup CLI adds a default plugin. This would be a very clean solution that works in any environment and would allow for very powerful implementations without putting the development load on Rollup core alone. But there is already a more recent issue about this so this can remain closed from my side.

lukastaegert avatar Aug 10 '19 08:08 lukastaegert

Much of this is driven by the requirements of StencilJS, a partnership I really appreciate.

lukastaegert avatar Aug 10 '19 08:08 lukastaegert

OK let's reopen and track then.

shellscape avatar Aug 10 '19 12:08 shellscape

I am Thinking about building a enterprise rollup product something like rollup-enterprise that is able to work like a online build chain that runs always and keeps all state in our fast in memory data stores with replication this way we could offer fastest build and deployment pipelines possible.

frank-dspeed avatar Jan 09 '20 05:01 frank-dspeed

Not a bug, but a curiosity why a hamfisted implementation of this feature would not work.

For some background, the project I am working on is in process of migrating a legacy Angular.js/Gulp3 code base. For compatibility reasons, most of the gulp tasks are left as is including file watchers and module injection. Passing a variable between gulp tasks does not seem doable, so willing to take the syncrounse read writes, I set up a gulp task that is essentially this:

    const rollup = require('rollup');
    const fs = require('fs');

    const rollupConfig = require('rollup.build.config.js');
    const rollupCacheFilePath = '.rollupCache';

    if (fs.existsSync(rollupCacheFilePath)) {
      rollupConfig.cache = JSON.parse(fs.readFileSync(rollupCacheFilePath));
    }

    return rollup.rollup(rollupConfig).then((bundle) => {
      fs.writeFileSync(rollupCacheFilePath, JSON.stringify(bundle.cache));
      return bundle.write(rollupConfig.output);
    });

It seems to work, the cache is recognized and only changed files are processed, but then it locks up. No file is written and the promise does not return with a successfully or with an error. Is there a reason why this would not work or did I miss a list of known plugins that have issues with rollup cache?

Thrilleratplay avatar Jan 30 '20 19:01 Thrilleratplay

My suspicion would be on rollup-plugin-commonjs as it relies on the transform hook being executed at least once for each module to determine if it is CommonJS. If a cached file is used instead, the promise will never complete: https://github.com/rollup/plugins/blob/07f325de8978ab0f0ff8a2befc23f898ff33eee3/packages/commonjs/src/index.js#L164

But this is a good point because it means:

  • Yes, there can be interesting plugin incompatibilities, and
  • We need to implement adequate caching for plugins first

lukastaegert avatar Jan 30 '20 20:01 lukastaegert

@lukastaegert Thank you for the quick response and that makes sense. I am using rollup-plugin-commonjs and that explains the behavior I was seeing.

Thrilleratplay avatar Jan 30 '20 20:01 Thrilleratplay

only my 2 cent while i in general love the hook api i think it would be more clear and great to change that

We need load and resolve but transform should be applyed in a extra step so that we always know all files are emitted and resolved. and then apply transform in a seperated step if needed.

That means we should think about async and parallel hooks we need maybe a serial hook additional

to be more clear we should add a end Hook or final and that should be sync while we can then code plugins for that final hook to be executed in workers if needed.

frank-dspeed avatar Jan 31 '20 06:01 frank-dspeed

Things are not that simple, especially for the transform hook. rollup-plugin-commonjs needs to hook into it to determine if something is CJS because it is entirely possible that there are other transformers before it (e.g. Babel, TypeScript) that are needed to make the code actually parseable JavaScript. There is really no advantage in postponing the transform steps except it will make things slower.

BTW final hooks exist for all phases, buildEnd for the build phase and generateBundle/writeBundle/ renderError for the generate phase. Not sure what making any of the sync would accomplish except making it impossible to do async things in those hooks. Also note that many hooks are marked as "sequential" for predictability, such as generateBundle.

lukastaegert avatar Jan 31 '20 07:01 lukastaegert

oh i was not so deep into this so we got already all needed infrastructure i think i should deep dive into that and create a userland implamentation of a stream and cash able result.

As you pointed out the hooks do exist that are needed for this so its a clear thing we need a plugin that creates a dependency graph a complet one that is cache able.

then we need something that catches all resolveId and load calls and only emits the once related to the changed files.

then all plugins can run as expected because they don't need to be aware of the outer scope.

Conclusion

the more i am thinking about this i am sure rollup should be refactored to run as daemon and when we emit files it should handle them right. I will do a PoC.

frank-dspeed avatar Jan 31 '20 07:01 frank-dspeed

New js user here. I had been relying on parcel day and night because of their zero-config feature. I just made the decision to move to rollup. After I am all set I noticed there's no persistent caching. Had just gone out and assumed it would be a standard thing. :stuck_out_tongue_closed_eyes: Keep up the good work guys. I am planning to use rollup for an SSR web-app targeted at old and cheap phones and possibly piss poor data connections in suburban areas.

pranaypratyush avatar Feb 03 '20 08:02 pranaypratyush

Hi, is there any workaround to feed rollup with some cache that was synced on disk? Would saving bundle.cache somehow to the disk (writeFile + eval? :D) and then reading it back would work? Thanks!

vvo avatar Jul 21 '20 11:07 vvo

@vvo if you find a good way to serialze circular references then maybe.

frank-dspeed avatar Jul 21 '20 14:07 frank-dspeed

The cache should not contain any circular references as it is explicitly created to be JSON.stringifyable. If you do not use @rollup/plugin-commonjs I would actually expect it to work in most setups (i.e. write JSON.stringify(cache) to disk and feed it back into the system via JSON.parse(...))

lukastaegert avatar Jul 21 '20 15:07 lukastaegert

Is there any update on this? A cold rebuild takes ~40s whereas with webpack (which I cannot use because of #2933) + file-system caching it only takes ~300ms after changing one file.

simonwep avatar Jan 26 '21 14:01 simonwep

This is my biggest frustration with all the JS build tools. I can't be happy when it takes seconds to build a project that I've already built and haven't changed anything.

burdiyan avatar Jun 18 '21 10:06 burdiyan

I've been looking into disk-based caching options to speed up test runs and dev server startup for the product I work on. I have a functional but immature solution at https://github.com/robertknight/rollup-cache. The aim is to make it easy to drop into an existing project with minimal configuration.

It currently enables caching for the resolveId, load and transform build hooks of several official Rollup plugins where I've found it to work well and provide a significant speed-up: commonjs, node-resolve, babel. There is also a feature that enables easy pre-building of npm dependencies as separate bundles in development. This speeds up rebuilds by reducing the amount of code that Rollup has to parse, analyze and serialize each time the bundle is built, independent of any transforms. Conceptually this is similar to shared libraries/DLLs in native apps or Webpack's DllPlugin plugin.

I did also look at caching the results of acorn's JS parsing, although when using naive JSON serialization of the AST, it didn't offer significant improvement over just re-parsing the input code.

robertknight avatar Nov 01 '21 20:11 robertknight

i am confused and more then that i did run the pwabuilder from microsoft it used rollup and created on watch mode a ".rollup.cache" on disk maybe some one has it working

the content of the .rolllup.cache folder is amazing it is better then the final bundle! @lukastaegert is that rollup cache from us or is that from microsoft.

in the config is nothing that looks like that

import resolve from "@rollup/plugin-node-resolve";
import html from "@open-wc/rollup-plugin-html";
import copy from "rollup-plugin-copy";
import replace from "@rollup/plugin-replace";
import typescript from "@rollup/plugin-typescript";

export default {
  input: "index.html",
  output: {
    dir: "build",
    format: "es",
    sourcemap: true
  },
  plugins: [
    resolve({
      exportConditions: ['development']
    }),
    html(),
    typescript({
      tsconfig: "tsconfig.dev.json",
    }),
    replace({
      "preventAssignment": true,
      "process.env.NODE_ENV": JSON.stringify(
        process.env.NODE_ENV || "production"
      )
    }),
    copy({
      targets: [
        { src: "assets/**/*", dest: "build/assets/" },
        { src: "styles/global.css", dest: "build/styles/" },
        { src: "manifest.json", dest: "build/" },
      ],
      copyOnce: true
    }),
  ],
};

i used that template https://github.com/pwa-builder/pwa-starter

when you then do npm run dev

the magic happens

frank-dspeed avatar Feb 05 '22 07:02 frank-dspeed

@robertknight you should not care for the speed in the first implementation.

I am working on parse5 you maybe know it a complet DOM parser written in JS there are tons of small tricks that we can apply once the final implementation is solid.

simply write the code that is most read and understand able for you and others later we can revisit that and implement some low level tricks like translating the loops and working with strings and other parsing algos.

frank-dspeed avatar Feb 05 '22 07:02 frank-dspeed

node_modules/.cache/rollup-cache/ looks like it comes from robertknight/rollup-cache

tigt avatar Feb 05 '22 18:02 tigt