heroku-buildpack-nodejs icon indicating copy to clipboard operation
heroku-buildpack-nodejs copied to clipboard

Caching .npm but not node_modules

Open passcod opened this issue 7 years ago • 9 comments

I tried to read through the code but I'm not familiar enough with it to understand how this works. I'm looking for a way to both

  • Make sure the .npm cache is cached and restored (or just a confirmation that it _is), and
  • Not cache node_modules.

npm 5 has very fast installs from a warm cache, but I have observed issues with some package upgrades when installing with a pre-existing node_modules. Thus, I want to disable node_modules caching while still keeping the warm .npm cache, to ensure smoother, more consistent builds.

Is this possible? and if yes, how?

passcod avatar May 28 '17 05:05 passcod

Hi same is for yarn, pls cache ~/.cache/yarn and not node_modules because there is a problem with yarn, that it will not run postinstall hooks when node_modules are restored :( problem with packages such as mozjpeg-bin

ondrejbartas avatar May 29 '17 18:05 ondrejbartas

I'd love to move to this as a default since it would avoid a lot of problems, but when @hunterloftis tried this with Yarn's cache directory we saw a significant spike in build times. We have better visibility into build performance now, so I might revisit it and see for myself.

At the very least I should get some docs up for how to do this. You should be able to do this using the cacheDirectories key in your package.json: https://devcenter.heroku.com/articles/nodejs-support#custom-caching

It defaults to caching node_modules and bower_components, but you can overwrite it to cache any directory you'd like. I haven't tried caching the cache directories yet, but there's no reason it shouldn't work 🤔 I'll look into this next week, but if you try it please report back!

jmorrell avatar May 30 '17 17:05 jmorrell

I tried to go through code (but didn't test it) and it seems that cacheDirectories are absolute paths joined from build_directory+cacheDirectory so you will not be able to add ~/.cache/yarn or you need to add ../.cache/yarn. I think. 😞

ondrejbartas avatar May 30 '17 19:05 ondrejbartas

Okay, got it working with a combination of heroku scripts and a cacheDirectory entry:

  "scripts": {
    "start": "...",
    "heroku-prebuild": "rm -rf $(npm config get cache); mv ../.npmcache $(npm config get cache) || echo No cache",
    "heroku-postbuild": "rm -rf ../.npmcache; mv $(npm config get cache) ../.npmcache || echo No cache"
  },
  "cacheDirectories": [
    "../.npmcache"
  ],

That excludes the cache from the slug, doesn't care where it's configured to be. It's quite ugly and would be nice to have it built-in, though!

Speed-wise, it's true that not caching node_modules means you have to rebuild native modules every time, but for the sake of a clean build that's fine. It doesn't take that long. Without native modules, here's a warm-cache build for a medium-sized app with npm 5.0.0:

       Installing node modules (package.json)
       added 295 packages in 7.835s

Pretty good!

passcod avatar May 31 '17 00:05 passcod

Some more numbers:

  • Just caching node_modules:

    up to date in 2.395s
    

    (but sometimes bad builds)

  • With native modules, warm npm5 cache, cold ccache:

    added 300 packages in 32.622s
    
  • With native modules, warm npm5 cache, warm ccache:

    added 300 packages in 12.317s
    

That 10 seconds slow down in exchange to always-good builds is worth it, to me. Of course, one may always add node_modules back to be cached if wanted.

passcod avatar May 31 '17 01:05 passcod

That 10 seconds slow down in exchange to always-good builds is worth it, to me. Of course, one may always add node_modules back to be cached if wanted.

Sure, in some cases the relative slowness of this approach will only lead to 10s increases, but as build times increase the delta in the approaches increases as well:

  • cold cache: 32.62s
  • node_modules: 2.34s (93% faster than cold)
  • cache dir: 12.32s (526% slower than node_modules)

So if you have a build that, for example, completes in 3 minutes with node_modules, it will take about 16 minutes with only a cache dir. The experience for users building intricate apps suffered greatly when we tried this before.

hunterloftis avatar Jun 05 '17 16:06 hunterloftis

Note I don't particularly mind if node_modules are cached by default, just that a) .npm is also (correctly) cached by default and b) node_modules caching can be disabled without also disabling .npm caches.

The tradeoffs I'm looking at is not "3 minutes vs 16 minutes", it's "builds always work consistently vs sometimes builds fail with only manual recourse".

On Tue, 6 Jun 2017, 04:41 Hunter Loftis, [email protected] wrote:

That 10 seconds slow down in exchange to always-good builds is worth it, to me. Of course, one may always add node_modules back to be cached if wanted.

Sure, in some cases the relative slowness of this approach will only lead to 10s increases, but as build times increase the delta in the approaches increases as well:

  • cold cache: 32.62s
  • node_modules: 2.34s (93% faster than cold)
  • cache dir: 12.32s (526% slower than node_modules)

So if you have a build that, for example, completes in 3 minutes with node_modules, it will take about 16 minutes with only a cache dir. The experience for users building intricate apps suffered greatly when we tried this before.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/heroku/heroku-buildpack-nodejs/issues/416#issuecomment-306238296, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgi7Evt6UcOSOgqBrXGCexZvUlCCXMks5sBC-xgaJpZM4NojVm .

passcod avatar Jun 05 '17 20:06 passcod

The tradeoffs I'm looking at is not "3 minutes vs 16 minutes", it's "builds always work consistently vs sometimes builds fail with only manual recourse".

Unfortunately we do have to take these trade-offs into account. I suspect we can improve on what we already have though.

Note I don't particularly mind if node_modules are cached by default, just that a) .npm is also (correctly) cached by default

I'm afraid we can't cache both by default either :( as that would push many large customers over the slug compilation limit.

@hunterloftis Any thoughts about providing an easy way to opt-in to caching the cache directories instead? Maybe modifying NODE_MODULES_CACHE to also accept cache-dir or something as an option?

@passcod Because $HOME is /app which is also where the application is running, you should just be able to cache .npmcache directly. Have you tried just this?

"cacheDirectories": [
    "../.npmcache"
  ],

jmorrell avatar Jun 05 '17 20:06 jmorrell

AFAICT, caching anything w/ .. doesn't work.

aaronjensen avatar Mar 15 '19 01:03 aaronjensen