rules_js icon indicating copy to clipboard operation
rules_js copied to clipboard

[Question]: Sharing remote cache between MacOS and Linux

Open gregjacobs opened this issue 3 years ago • 7 comments

What is the current behavior?

Hey guys, I'm hoping you can help me with this. I'm using a remote cache that I was hoping to share outputs between Mac and Linux (CI).

However, currently building everything on my Mac still causes my CI machine (on Linux) to rebuild everything. I'm guessing this has something to do with the platform somehow? (even though JS outputs are platform-independent?)

Do you know of a way to make this work?

gregjacobs avatar Dec 07 '22 17:12 gregjacobs

This isn't JS-specific, Bazel generally treats action inputs as opaque, checksummed files, and so if one action has a linux nodejs interpreter as an input, it will have a different cache key than the same action with a mac nodejs interpreter.

My understanding is that https://github.com/bazelbuild/bazel/pull/15542 gets us closer, fixing the case where an output .js file is the same between linux and mac, so that subsequent actions that use that .js file could be cache hits between different platforms. However the nodejs interpreter would still make the cache keys different.

@fmeum is there an issue on Bazel that expresses the general "cross-platform cache hits" feature request?

alexeagle avatar Dec 07 '22 18:12 alexeagle

#6526 is the most fitting one. @tjgq is also working on solving the "multiple interpreters/SDKs" issue.

fmeum avatar Dec 07 '22 19:12 fmeum

Hey @alexeagle, @fmeum, thanks for the replies. I'll be following https://github.com/bazelbuild/bazel/issues/6526 and hoping for this soon! This would be a huge win for our developers developing our monorepo on Mac but building/testing on Linux CIs. Being able to share the cache artifacts and test outputs back and forth should save a lot of time.

@alexeagle Feel free to close this issue if you like, unless you want to track this here.

Thanks again, Greg

gregjacobs avatar Dec 08 '22 15:12 gregjacobs

I discussed this with @tjgq and there's an approach which is easy for us to try, I think of it as a "multiplex toolchain". Interpreter for all platforms are inputs, which is a bit wasteful, then when execution begins you pick the one for the exec platform. That way the cache inputs appear the same on all platforms.

alexeagle avatar Aug 28 '23 17:08 alexeagle

Well that's definitely an interesting idea!

I'm just realizing though: scripts could in theory do something different based on os.platform(), so maybe the per-platform requirement makes sense 😶 Although on the other hand, for web outputs, that wouldn't matter. This is a tough one.

gregjacobs avatar Sep 10 '23 01:09 gregjacobs

After working with the rules for a while now, I'm having difficulty imagining a case where the outputs of a JS program would be different based on exec platform. Are you guys able to think of any?

If not, I think the above solution might be a worthy tradeoff. I'd rather have the shared cache and have a little longer download time for the Node binaries (which happens only once every so often). And in the event that someone ever comes up with a reason to not follow this anymore (i.e. they've found a case where Node outputs are different based on exec platform), could revert.

What do you guys think?

gregjacobs avatar Apr 19 '24 21:04 gregjacobs

This came up at BazelCon this year in a talk on performance: https://static.sched.com/hosted_files/bazelcon2024/d0/TB-137%20Sharmila%20BazelCon%202024%20-%20Performant%20Bazel%20Builds%20for%20Web%20Monorepos%20at%20Scale.pdf

I think it's time to implement this. Since it's breaking in theory when a program senses the os.platform, we should just have a flagged rollout. In rules_js 2.x it would be off by default with a TODO to flip default to true in rules_js 3.0

alexeagle avatar Oct 18 '24 19:10 alexeagle