Incorrect node binary uploaded for remote execution
🐞 bug report
Affected Rule
Seemingly any rule that executes run_node
In my particular case, ts_project is causing this error, but I get the same error with other run_node-based rules (including ones that I have written).
Is this a regression?
Not sure
Description
I'm not sure if this is specifically a rules_nodejs problem, but it seems like it might be.
When running any run_node-based rule with --remote_executor, Bazel seems to upload the Node binary of the host platform rather than the binary for the execution platform. I'm running Bazel from Mac (host platform), and trying to execute on a remote Linux Buildbarn cluster.
Here is the error that I'm getting:
ERROR: /dev/bazel-poc/package-1/BUILD:5:16: Compiling TypeScript project //package-1:library_tsc [tsc -p package-1/tsconfig.json] failed: (Exit 126): tsc.sh failed: error executing command
(cd /private/var/tmp/_bazel_user/df7a4c614b8242298350c2cf3d5949ba/execroot/bazel_poc && \
exec env - \
BAZEL_NODE_MODULES_ROOTS='' \
COMPILATION_MODE=fastbuild \
bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh --project package-1/tsconfig.json --outDir bazel-out/darwin-fastbuild/bin/package-1 --rootDir package-1/src --declarationDir bazel-out/darwin-fastbuild/bin/package-1 --tsBuildInfoFile bazel-out/darwin-fastbuild/bin/package-1/library_tsc.tsbuildinfo '--bazel_node_modules_manifest=bazel-out/darwin-fastbuild/bin/package-1/_library_tsc_TsProject.module_mappings.json')
# Configuration: 3202722b467eefd72f81b74e361edbcb387d5f20595fef5e6262727e3ce81321
# Execution platform: @local_config_platform//:host
Action details (uncached result): http://buildbarn.<internal-url>.com/uncached_action_result/rhel7/66ed02cf7dc475eeeb567965de44399c5fa000d64b2d1e691833b0304bdcc8b7/543/
bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh: line 224: /worker/build/3476511f3cd3bb47/root/bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh.runfiles/nodejs_darwin_amd64/bin/nodejs/bin/node: cannot execute binary file
Target //package-1:library failed to build
Note in the second to last line of output that the machine can't execute nodejs_darwin_amd64/bin/nodejs/bin/node (trying to run the Mac binary on a Linux machine).
🔬 Minimal Reproduction
Unfortunately I don't have a publicly-facing remote build cluster to share a reproduction on, but the project is simply running a ts_project rule and using --remote-executor. The ts_project rule works correctly when building locally.
I think the simplest minimal reproduction is simply running a ts_project rule from a host machine that is of a different platform than the remote executor's platform.
🔥 Exception or Error
bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh: line 224: /worker/build/3476511f3cd3bb47/root/bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh.runfiles/nodejs_darwin_amd64/bin/nodejs/bin/node: cannot execute binary file
^-- Bazel attempting to run MacOS binary on Linux
🌍 Your Environment
Operating System:
MacOS 10.14.6 (Mohave)
Output of bazel version:
Build label: 5.1.1
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Apr 8 15:57:36 2022 (1649433456)
Build timestamp: 1649433456
Build timestamp as int: 1649433456
Rules_nodejs version:
5.4.2
Anything else relevant?
It looks like there are a few related issues. Not sure if the first one has been addressed even though it was closed.
- https://github.com/bazelbuild/rules_nodejs/issues/1305
- https://github.com/bazelbuild/rules_nodejs/issues/1038
Hey guys, just to ask: what would cause this? I can try taking a look at doing some modifications to rules_nodejs within my corporate network (where I can try the builds on buildbarn), and then contribute back any fixes, but I'm still learning Bazel and just not quite sure where to start. Any pointers on where to look would be helpful.
Thanks, Greg
Hi there! I think I'm hitting this as well. In my case is with pkg_web and RBE. Trying to execute assembler.js inside pkg_web throws nodejs_linux_amd64/bin/nodejs/bin/node: cannot execute binary file.
It seems weird to me because looking at pkg_web code, the execution requirements for assembler.js nodejs_binary contains no-remote and no-remote-exec.
FWIW, @gregmagolan used this toolchain (from a Mac host) with Engflow RBE yesterday and got a correct nodejs interpreter on the exec platform, so I don't think this is totally broken. We might need a repro.
hi @alexeagle @gregmagolan. I've created a repro. The failing target is in the branch pkg_web of the following repo: https://github.com/danigar/bazel-swc-lab/tree/pkg_web
Hope this helps =)
Hi! I've bumped into the same stuff with RBE. Are there any known workarounds?
I recently ran into a similar issue at work; we're also using EngFlow for RBE.
I was running a rule that uses the nodejs toolchain. I wanted to run a build on a local Mac first using nodejs & then hand the result off to RBE to run tests against it (the testing is also using nodejs).
My build was always resolving nodejs_linux_amd64/bin/nodejs/bin/node on local and remote. And would fail in the same way, where the Linux binary couldn't be executed on my Mac.
I traced the issue down to our Engflow cross-platform configuration in .bazelrc. It turns out that Bazel doesn't fully support this kind of cross-platform handoff, i.e., running some rules on one platform & others on a different platform. We're dealing with the interaction between macOS <> Linux by lying about the host platform in our configuration and always setting the host machine to Linux when the RBE config is used.
I think this is the common thing to do for EngFlow. For example, the configs have --host_platform=//some-linux:platform, pointing at some platform target with Linux constraints in your .bazelrc (e.g., in bazel-swc-lab). If I understand all this, that config will make all toolchains, including the one from rules_nodejs, resolve to platforms that support Linux.
I'm not sure what the ideal solution is. We can't drop the platform overrides in many cases because we need to support RBE and we need to be able to use EngFlow for both local & remote builds.
One terrible workaround we're considering for our custom toolchains is lying in the toolchain definition and saying that the macOS binaries are also compatible with Linux 🤦🏾♀️ . That way, we can expose all compatible binaries, and make an entry point for the toolchain that detects which platform we're on and uses the correct binary. When we run on macoOS with the host platform set to Linux, we can detect and run the right binary.