bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Stabilize unresolved symbolic links (--experimental_allow_unresolved_symlinks)

Open jmillikin opened this issue 5 years ago • 33 comments

Description of the problem / feature request:

https://github.com/bazelbuild/bazel/issues/9075 added support for unresolved symlinks, via the actions.declare_symlink() and actions.symlink() functions. That functionality is currently gated by --experimental_allow_unresolved_symlinks.

I've discovered that unresolved symlinks are critical for supporting hermetic Node.JS builds. Specifically, Node's use of symlinks for module resolution semantics mean that we need to be able to create symlink artifacts that reference dependencies and can be used as runfiles, but exist outside the runfiles root.

What are the current blockers to stabilizing unresolved symlinks? Is there a list of known remaining work to get them enabled by default?

cc @lberki @laszlocsomor

jmillikin avatar Nov 24 '19 02:11 jmillikin

@laurentlb , @laszlocsomor , what do you think?

The main blockers I see is that RBE doesn't support this yet (the protocol itself does, but we haven't wired it up yet within Bazel) and that it probably fails on Windows in all sorts of interesting ways.

Neither of these are insurmountable problems and if I squinted the right way, I could even argue that they are not necessary for declaring the functionality stable, but I just can't sign up for doing them at the moment.

lberki avatar Nov 25 '19 07:11 lberki

@lberki : Why do you suppose it fails on Windows? Is there a test case I could try on Windows?

laszlocsomor avatar Nov 25 '19 08:11 laszlocsomor

Try //third_party/bazel/src/test/shell/bazel:bazel_unresolved_symlink_test.

I admit I haven't tried it on Windows. There is a chance that it works since all that's needed is ActionMetadataHandler#fileArtifactValueFromArtifact() to work which in turn only needs Path#statIfFound(Symlinks.NOFOLLOW) and Path#readSymbolicLink() .

lberki avatar Nov 25 '19 08:11 lberki

...oh, and Path#createSymbolicLink() for SymlinkAction#execute().

lberki avatar Nov 25 '19 08:11 lberki

Thanks! The test fails here: https://github.com/bazelbuild/bazel/blob/b1e232d6f1d82494639660e8a2772ac7da7c5da8/src/test/shell/bazel/bazel_unresolved_symlink_test.sh#L243

...because the build succeeds though it shouldn't. (Note: I edited the test to use c:/badsymlink instead of /badsymlink.)

laszlocsomor avatar Nov 25 '19 08:11 laszlocsomor

~The culprit is~ I think the culprit is that WindowsFileSystem does not create file symlinks: https://github.com/bazelbuild/bazel/blob/b1e232d6f1d82494639660e8a2772ac7da7c5da8/src/main/java/com/google/devtools/build/lib/windows/WindowsFileSystem.java#L85-L89

laszlocsomor avatar Nov 25 '19 09:11 laszlocsomor

@lberki : test_file_instead_of_symlink breaks on Linux if I add --spawn_strategy=standalone to the bazel calls.

laszlocsomor avatar Nov 27 '19 13:11 laszlocsomor

This seems to be failing on macOS if paired with remote caching;

java.lang.IllegalStateException: Encountered symlink input '<some nice path here>', but all symlinks should have been resolved by SkyFrame. This is a bug.
	at com.google.devtools.build.lib.remote.merkletree.DirectoryTreeBuilder.fromActionInputs(DirectoryTreeBuilder.java:106)
	at com.google.devtools.build.lib.remote.merkletree.DirectoryTreeBuilder.fromActionInputs(DirectoryTreeBuilder.java:47)
	at com.google.devtools.build.lib.remote.merkletree.MerkleTree.build(MerkleTree.java:123)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:121)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:119)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:94)
	at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:39)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.rules.cpp.CppCompileAction.beginExecution(CppCompileAction.java:1381)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:124)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:966)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1114)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1085)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:136)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:80)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:612)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:907)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:297)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:438)

AlessandroPatti avatar Mar 16 '20 08:03 AlessandroPatti

AFAIK this has been experimental since Bazel 1.0.

Is progress likely?

pauldraper avatar Nov 01 '21 04:11 pauldraper

Any updates?

bjulian5 avatar Apr 05 '22 22:04 bjulian5

I hit this recently as well. I was using the feature to break circular deps between npm packages (which sadly exist in some 3rd party packages). Confirmed that it does not work with remote execution.

I also hit a bug in bazel itself related to this where the --experimental_allow_unresolved_symlinks configuration setting is not propagated to the host and exec configs so when building a target under either of those configs that calls actions.create_symlink, bazel errors out with https://github.com/bazelbuild/bazel/blob/a32a0fd0d6bf75c2c8c6af6281875e90908b82f6/src/main/java/com/google/devtools/build/lib/analysis/starlark/StarlarkActionFactory.java#L195.

gregmagolan avatar Apr 12 '22 18:04 gregmagolan

Looking at the history of this issue, the main blockers seem to be Windows and RBE support. Once those are fixed (or we decide that they don't matter), we can move this feature out of the experimental domain.

@coeuvre : What do you think about updating our RBE support to support unresolved symbolic links? I realize you are busy so I was thinking about soliciting pull requests from the community.

@meteorcloudy : given that symlinks are not a very common thing on Windows, can we get away without supporting unresolved symlinks? I don't even know if NTFS supports dangling symlinks; if it doesn't, then the decision is easy because then unresolved symlinks don't make all that much sense, do they?

lberki avatar Apr 19 '22 11:04 lberki

@lberki My guess is that it will work correctly [--windows_enable_symlinks](with https://bazel.build/reference/command-line-reference#flag--windows_enable_symlinks), NTFS does support dangling symlinks.

Is there a simple test that I can run to verify?

meteorcloudy avatar Apr 19 '22 14:04 meteorcloudy

test_smoke in bazel_symlink_test?

lberki avatar Apr 19 '22 15:04 lberki

~~This test passes on Windows~~ https://buildkite.com/bazel/bazel-bazel/builds/19324#75eafab1-c5ae-4830-add5-fbe84692c31e

//src/test/shell/bazel:bazel_symlink_test                                PASSED in 29.4s

Oh, it passes because it's skipped on windows..

meteorcloudy avatar Apr 19 '22 15:04 meteorcloudy

@lberki I am happy to review PRs :)

cc @tjgq

coeuvre avatar Apr 20 '22 09:04 coeuvre

@lberki test_smoke passed on Windows with --windows_enable_symlinks after fixing the path style for Windows (/nonexistent => C:/nonexistent)

** test_smoke ******************************************************************
$TEST_TMPDIR defined: output root default is 'c:\b\erp625is\execroot\io_bazel\_tmp\f8978fe5636e0739a65f37d28c69faa9' and max_idle_secs default is '15'.
Loading: 
Loading: 0 packages loaded
Analyzing: target //a:a (1 packages loaded, 0 targets configured)
INFO: Analyzed target //a:a (4 packages loaded, 6 targets configured).
INFO: Found 1 target...
[0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt
Target //a:a up-to-date:
  bazel-bin/a/a
INFO: Elapsed time: 0.597s, Critical Path: 0.01s
INFO: 2 processes: 2 internal.
INFO: Build completed successfully, 2 total actions
INFO: Build completed successfully, 2 total actions
total 0
lrwxrwxrwx 1 b None 14 Apr 20 09:21 a -> /c/nonexistent
INFO[bazel_symlink_test 2022-04-20 09:21:04 (+0000)] Cleaning up workspace

[0;32mPASSED[0m: test_smoke

I believe Windows isn't really a blocker, dangling symlinks and dangling junctions both work.

meteorcloudy avatar Apr 20 '22 09:04 meteorcloudy

@meteorcloudy then that leaves RBE support, I guess.

(plus some glue logic to hard-wire the value of this flag to false within Google, but that, I can throw in since contributions don't work for that one)

lberki avatar Apr 20 '22 11:04 lberki

Is Alessandro's comment about remote caching on MacOS still relevant?

https://github.com/bazelbuild/bazel/issues/10298#issuecomment-599416411

pauldraper avatar Apr 20 '22 18:04 pauldraper

While working on resolving the various remaining issues around unresolved symlinks in Bazel, I came across a subtle decision we will have to make: When an action is executed with a local symlink-based sandboxed strategy (e.g. linux-sandbox or darwin-sandbox), there are two fundamentally different ways of staging an unresolved, relative symlink:

  1. Stage it as a symlink pointing to the real relative symlink in the unsandboxed exec root. This mirrors the behavior for any other kind of artifact.
  2. Stage the symlink directly as a relative symlink pointing to some other file in the sandboxed exec root. This is the current behavior.

This point is certainly controversial because rules_js would like the behavior to be 1 (@gregmagolan), whereas I could see rules_pkg users wanting the behavior to be 2 (@aiuto). Am I correct?

Also CC @alexjski @larsrc-google who I've noticed are interested in this based on comments on other PRs.

fmeum avatar Jul 22 '22 17:07 fmeum

@fmeum I tested your commit that stages it as a symlink pointing to the real symlink (option 1 above), https://github.com/fmeum/bazel/commit/8e9fa6a45c0735f5d433e1118455de61a64bff12. This fixes the sandbox issues with symlinks in rules_js. In my opinion this is the principled approach as makes all artifacts in the sandbox consistent and retains all information. I.e., you can follow the symlink out of the sandbox to the real symlink to find out where that one points to.

gregmagolan avatar Jul 22 '22 19:07 gregmagolan

@fmeum I tested your commit that stages it as a symlink pointing to the real symlink (option 1 above), fmeum@8e9fa6a. This fixes the sandbox issues with symlinks in rules_js. In my opinion this is the principled approach as makes all artifacts in the sandbox consistent and retains all information. I.e., you can follow the symlink out of the sandbox to the real symlink to find out where that one points to.

Wouldn't that mean that such a symlink would allow sandboxed action access undeclared inputs? Imagine the unresolved symlink points to a path of an output artifact we do not depend on. If we use strategy 2, the behavior would always be the same, as in we would get an unresolved symlink. In case of 1, depending on action execution order, could we sometimes get an actual file?

alexjski avatar Jul 22 '22 19:07 alexjski

@alexjski Can you give an example? You'd only get a symlink to a symlink in the sandbox if you have the symlink outside the sandbox as an input.

In either approach you can call realpath and find out the actual file the symlink is pointing to if that is desired. If you want to retain the information of where the symlink outside of the sandbox is pointing to, as rules_js does, I believe you can only do that with approach 1.

gregmagolan avatar Jul 22 '22 19:07 gregmagolan

@alexjski Can you give an example? You'd only get a symlink to a symlink in the sandbox if you have the symlink outside the sandbox as an input.

In either approach you can call realpath and find out the actual file the symlink is pointing to if that is desired. If you want to retain the information of where the symlink outside of the sandbox is pointing to, as rules_js does, I believe you can only do that with approach 1.

I just tested that and it seems to me like we have strategy 1, at least to some extent. That makes me think that I am missing some crucial context here and the best bet would be to see what @larsrc-google thinks about that. Anyway, let me share the example at least to show the problem and the kind of scenario I had in mind:

$ cat def.bzl 
def _r(ctx):
  symlink = ctx.actions.declare_symlink(ctx.label.name + "_s")
  ctx.actions.symlink(output=symlink, target_path=ctx.file.file.path)

  output = ctx.actions.declare_file(ctx.label.name)
  ctx.actions.run_shell(
    command = "{ cat %s || true; } >  %s" % (symlink.path, output.path),
    inputs = [symlink],
    outputs = [output],
  )
  return DefaultInfo(files=depset([output]))

r = rule(implementation=_r, attrs = {'file': attr.label(allow_single_file=True)})
$ cat BUILD 
load("def.bzl", "r")

genrule(name = "a", outs = ["file"], cmd = "echo hello >$@")
r(name="b", file="file")

Now:

$ bazel clean && bazel build :b --experimental_allow_unresolved_symlinks --spawn_strategy=sandboxed && echo "Result: '$(<bazel-bin/b)'"
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Analyzed target //:b (5 packages loaded, 9 targets configured).
INFO: Found 1 target...
INFO: From Action b:
cat: bazel-out/k8-fastbuild/bin/b_s: No such file or directory
Target //:b up-to-date:
  bazel-bin/b
INFO: Elapsed time: 0.215s, Critical Path: 0.02s
INFO: 3 processes: 2 internal, 1 linux-sandbox.
INFO: Build completed successfully, 3 total actions
Result: ''

Let's try the same after build :a:

$ bazel clean && bazel build :a --spawn_strategy=sandboxed && bazel build :b --experimental_allow_unresolved_symlinks --spawn_strategy=sandboxed && echo "Result: '$(<bazel-bin/b)'"
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Analyzed target //:a (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //:a up-to-date:
  bazel-bin/file
INFO: Elapsed time: 0.208s, Critical Path: 0.01s
INFO: 2 processes: 1 internal, 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
INFO: Build option --experimental_allow_unresolved_symlinks has changed, discarding analysis cache.
INFO: Analyzed target //:b (0 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:b up-to-date:
  bazel-bin/b
INFO: Elapsed time: 0.128s, Critical Path: 0.01s
INFO: 3 processes: 2 internal, 1 linux-sandbox.
INFO: Build completed successfully, 3 total actions
Result: 'hello'

This is only my personal opinion, but such interaction seems like an undesirable property in terms of determinism of build results (imagine bazel build :a :b where race between a and b actions decides the result). This is result of strategy 1. If we planted the symlinks relative to sandbox execroot, the symlink would always point to a non-existent file.

alexjski avatar Jul 22 '22 21:07 alexjski

I see what you mean @alexjski . That makes sense and symlinks that pointed relatively within the sandbox would work for rules_js.

When I was testing @fmeum's WIP branch, the issue I ran into was in a runfiles tree in the sandbox which seems like a special case not handled properly yet. There was a declare_symlink input in the runfiles which I believes suffers from the same problem you just described where the runfiles tree is a symlink to a symlink outside of the runfiles tree. The sandbox version of that symlink was pointing outside of the sandbox and outside of the runfiles tree to the realpath.

Outside of the sandbox in the runfiles tree:

bazel-out/darwin-opt-exec-2B5CBBC6/bin/examples/npm_deps/bin1.sh.runfiles/aspect_rules_js/examples/npm_deps/node_modules/@gregmagolan/test-b ->
bazel-out/darwin-opt-exec-2B5CBBC6/bin/examples/npm_deps/node_modules/@gregmagolan/test-b

In the sandbox,

bazel-out/darwin-opt-exec-2B5CBBC6/bin/examples/npm_deps/bin1.sh.runfiles/aspect_rules_js/examples/npm_deps/node_modules/@gregmagolan/test-b ->
../../../../node_modules/.aspect_rules_js/@[email protected]/node_modules/@gregmagolan/test-b

If we could maintain relative symlinks within the runfiles trees and within the sandbox then that would solve the problem I hit as well.

gregmagolan avatar Jul 22 '22 21:07 gregmagolan

I'm glad to learn that the current approach can work for rules_js if we fix the runfiles tree issue - I was also worried about the implications the alternative would have had on hermiticity.

I will look into fixing the runfiles tree issue. @alexjski Would you be available as a reviewer for a fix for that?

fmeum avatar Jul 22 '22 21:07 fmeum

I put up a PR to make using declare_symlink conditional on whether or not --experimental_allow_unresolved_symlinks is set, https://github.com/aspect-build/rules_js/pull/283. That should be all that is needed in rules_js to use unresolved symlinks so it will be ready for use there once all of your work lands @fmeum .

gregmagolan avatar Jul 26 '22 00:07 gregmagolan

We've had some discussions about symlinks internally, and there are many more corner cases than I realized. I've started putting together the information we have, so we can avoid making more changes we later have to undo.

larsrc-google avatar Jul 28 '22 09:07 larsrc-google

We've had some discussions about symlinks internally, and there are many more corner cases than I realized. I've started putting together the information we have, so we can avoid making more changes we later have to undo.

Can you share this information publicly, possibly already while it is being compiled? I have already identified a number of these corner cases and would like to have a central location to collect and discuss them.

fmeum avatar Jul 28 '22 11:07 fmeum

Changing to P1 since this is blocking the cut of Bazel 6.0 release. FYI @lberki @larsrc-google

meteorcloudy avatar Sep 14 '22 16:09 meteorcloudy