bazel `--incompatible_sandbox_hermetic_tmp` breaks outputs materialized as a symlinks to source artifacts

Originally reported by @tyler-french in https://github.com/bazelbuild/bazel/issues/20886 (but seems to be a separate issue).

Minimal repro (Linux):

.bazelversion

7.0.2

.bazelrc

build --noenable_bzlmod

repo.bzl

def _impl(rctx):
  rctx.file("file.txt", "hello")
  rctx.file("BUILD", """exports_files(["file.txt"])""")

repo = repository_rule(_impl)

WORSKPACE

load(":repo.bzl", "repo")
repo(name= "repo")

BUILD

genrule(
  name = "gen",
  outs = ["file.txt"],
  srcs = ["@repo//:file.txt"],
  cmd = "ln -s $< $@",
)

Result:

ERROR: /usr/local/google/home/tjgq/proj/dangling-symlink-repro/BUILD:1:8: declared output 'file.txt' is a dangling symbolic link
ERROR: /usr/local/google/home/tjgq/proj/dangling-symlink-repro/BUILD:1:8: Executing genrule //:gen failed: not all outputs were created or valid

I believe this is because we copy (actually, move) the symlink as-is out of the sandbox, so it ends up pointing to a location like /tmp/bazel-source-roots/1/file.txt, which only makes sense in the context of the sandbox.

Things to keep in mind when fixing this:

Make sure that symlinks to the main repo (as opposed to an external repo) work as well
Make sure that an indirection through multiple symlinks works as well

Feb 06 '24 09:02 tjgq

@bazel-io fork 7.1.0

Feb 06 '24 10:02 fmeum

Make sure that an indirection through multiple symlinks works as well

A tricky edge case: The output may be declared as an unresolved symlink (ctx.actions.declare_symlink), so we would not want to follow all symlinks.

Maybe we need to follow symlinks exactly until after the last one that points under one of the sandbox roots under /tmp?

Feb 06 '24 10:02 fmeum

If it's an unresolved symlink, I'm not even sure we should do any path transformation at all. That has the potential to break cases where the symlink target path is textually fixed.

(If one intentionally wants a declare_symlink to point to some other file in the build, it should be a declare_file/declare_directory instead; we've gone to great lengths to optimize for that use case, so there's no reason to make things unnecessarily non-hermetic.)

Feb 06 '24 10:02 tjgq

Is the attitude that this is WAI viable?

On the practical side, I'd rather have less code than more code and following symlinks and magically replacing /tmp/bazel-* paths on any intermediate paths during symlink resolution sounds quite error-prone.

On the theoretical side, the whole point of sandboxing is that what the action does and what it results in shouldn't be a function of the location of the output base, the location of the source trees or whether the action is sandboxed. And arguably the result of ln -s hard-codes said locations.

I do realize that the stance of "thou shalt always copy your inputs to outputs" may harm performance, though.

(For the record, SandboxHelpers.moveOutputs() does exactly what @tjgq suspects: it just calls FileSystemUtils.moveFile(), which in turn calls rename(). It also warns against the harms of copying, though, which is a counter-argument against what I said above.

Feb 12 '24 13:02 lberki

I started out firmly in camp "WAI", but my opinion has changed to "we should perform this magic replacement".

If it were just for the ln -s case, I would agree that we could reasonably expect users to just not do this.

But the original repro by @tyler-french in https://github.com/bazelbuild/bazel/issues/20886#issuecomment-1919999446 makes a more convincing case for this being a "Bazel bug": When using the sandbox, Bazel hands the user inputs in a form (symlinks pointing under /tmp/bazel-*) that it would not accept as outputs (e.g. produced via the innocuous cp -r applied to the inputs). "Thou shalt always copy your inputs to outputs while resolving all symlinks" is a much harder sell :-)

Feb 12 '24 18:02 fmeum

Meh. I can't put up any good argument against yours, but then we'll have to gear up for implementing symlink resolution yet another time :(

Feb 13 '24 10:02 lberki

just to confirm, are we still planning to make progress on this for 7.1.0? It's looking like we'll cut the first RC without fixing this anyway, but if this is being worked on, we can still get the fix into later RCs.

Feb 21 '24 18:02 Wyverald

Were we? I didn't realize that this was slated for 7.1.0 and there is no indication on this bug to that effect. (cc @oquenchil @meteorcloudy )

Feb 22 '24 07:02 lberki

Were we? I didn't realize that this was slated for 7.1.0 and there is no indication on this bug to that effect. (cc @oquenchil @meteorcloudy )

I forked it for 7.1.0 in https://github.com/bazelbuild/bazel/issues/21215#issuecomment-1929229757, which was after @oquenchil marked it as P1 and "potential release blocker". The bazel-io bot removes this label after it has created the fork, which may have caused confusion. Maybe we could introduce a dedicated label for "marked for cherry-pick" issues that remains after they have been forked?

Happy to help fix this, it looks like the last real blocker to more widespread hermetic tmp adoption.

Feb 22 '24 08:02 fmeum

Having to fix every output symlink seems costly. It will be costly for every build since we have to scan outputs of every action to see if they are symlinks and it will be costly in terms of maintenance.

An alternative is the following order of bind mounting which I think should fix this issue (even for builds with source root and exec root under /tmp):

$SANDBOX/{unique-integer-per-action}/root/bazel-working-dir this will contain the symlink tree and will be the cwd of the action
mkdir $SANDBOX/{unique-integer-per-action}/root this will be where we mount every top level directory of the real system root except /tmp
real system root /{anything-other-than-tmp} gets mounted at $SANDBOX/{unique-integer-per-action}/root/{anything-other-than-tmp}
real system root /tmp gets mounted at $SANDBOX/{unique-integer-per-action}/_tmp
every real system root /tmp/{dir} gets mounted at $SANDBOX/{unique-integer-per-action}/_tmp/{dir} (This is to make sure this solution still works for source roots and execroots under /tmp/)
$SANDBOX/{unique-integer-per-action}/root gets mounted at /
cwd of the action is /bazel-working-dir
mkdir /tmp
$SANDBOX/{unique-integer-per-action}/_tmp/ gets mounted at /tmp

That's not ideal either. Still costly and hard to maintain.

I think the whole implementation of hermetic tmp should be replaced with (handwave) a single line of code:

mount -t overlay  overlay  -olowerdir=/tmp/,upperdir=/tmp/{unique-per-action},workdir=/tmp/_mount_point  /tmp

I tried locally on the terminal and it works as expected. The current process will see any files it writes to /tmp but after the action is finished and /tmp is unmounted, the files won't be there. Only that process can see the files that it itself writes to /tmp.

Files from the system /tmp will still be visible by all actions under /tmp which is a change from the current behavior. From #19915, /tmp being empty doesn't seem like a requirement. From the description of the flag, the new behavior would be equivalent to adding --sandbox_add_mount_pair=/tmp.

Please let me know what you all think and I can send a change adding this.

Feb 24 '24 10:02 oquenchil

I tried locally on the terminal and it works as expected. The current process will see any files it writes to /tmp but after the action is finished and /tmp is unmounted, the files won't be there. Only that process can see the files that it itself writes to /tmp. Files from the system /tmp will still be visible by all actions under /tmp which is a change from the current behavior.

This actually seems much better than the current semantics as it is less of a breaking change (e.g. Docker sockets under /tmp can still be used without having sockets created by the action pollute the host /tmp). If it's also simpler and resolves the current issue, I'm 100% for this solution. Great find!

Feb 24 '24 10:02 fmeum

I like this idea too! But to be clear, the new behavior of --incompatible_sandbox_hermetic_tmp wouldn't be exactly the same as --sandbox_add_mount_pair=/tmp, right? AIUI, the latter not only makes the system /tmp visible to sandboxes, but also makes the files created by a sandbox visible to other sandboxes; the last part is what we're actually trying to protect against.

What about older Linux kernels where mount -t overlay is still a privileged operation? Will we auto-detect or require users to manually override the flag?

Feb 24 '24 11:02 tjgq

What about older Linux kernels where mount -t overlay is still a privileged operation? Will we auto-detect or require users to manually override the flag?

Could we test this as part of the one-time "sandboxing available" check? Requiring a manual override for a default wouldn't be a great user experience. But I also don't know what the market share of affected kernel versions is.

With the overlay approach, there could theoretically still be issues with concurrent Java compilations, but only if there is a concurrent one on the host with a PID of 12, which is presumably too low to occur in practice (unless the concurrent action is also sandboxed in a similar way). I think that's acceptable.

Feb 24 '24 13:02 fmeum

https://docs.kernel.org/filesystems/overlayfs.html#changes-to-underlying-filesystems sounds as if this scheme may not be safe when there are concurrent accesses to /tmp.

Feb 24 '24 22:02 fmeum

https://docs.kernel.org/filesystems/overlayfs.html#changes-to-underlying-filesystems sounds as if this scheme may not be safe when there are concurrent accesses to /tmp.

@oquenchil and I discussed this offline and found the thread at https://lore.kernel.org/all/CA+ZH+jFBAaRi6VPmf3PBdDZQQMOaT6WUByATqaw1QL5M9+-dxg@mail.gmail.com/, which makes it sound as if overlayfs would be fine to use in practice under the weaker assumption that there are no concurrent changes to the subtree of /tmp containing files that the sandboxed process is supposed to read (e.g. the output base under /tmp). Since any "clever" (the bad kind) symlink algorithm or bind mount scheme would also have the potential for breakages, I would actually prefer the overlayfs approach even without a formal correctness guarantee.

If anyone here knows anyone who happens to know more about kernel FS implementations, that would of course be very helpful. :-)

Feb 26 '24 20:02 fmeum

Perfect, thank you Fabian. I'd go with the overlayfs solution when we detect sourceroot and execroot are under /tmp but would take a simpler code path when we don't detect that. In these cases a simple bind mount over /tmp (maybe even mounting tmpfs over /tmp by default) should work.

The behavior between the two would be different since for builds with source-execroot under /tmp, the system /tmpwould be visible but for the other case it would start as empty. What do you think? Even though we go on the assumption that overlayfs works as we expect until proven otherwise, it might make sense to reduce its usage if we can avoid it when we are not building under /tmp.

@lberki The commit 8e32f44a279c5c4e9c24fb84a08276ffe4fe90c0 affected even builds without source-execroot under /tmp. Do you think it's fine to take a different code path in those cases?

Feb 27 '24 09:02 oquenchil

In these cases a simple bind mount over /tmp (maybe even mounting tmpfs over /tmp by default) should work.

A memory-backed tmpfs for /tmp is nice when it works, but I don't think it would be a good default as tests can use GBs of tmp storage, which may overwhelm the host. But a simple bind mount as originally introduced for --incompatible_sandbox_hermetic_tmp.

it might make sense to reduce its usage if we can avoid it when we are not building under /tmp.

I see pros and cons here: If we are relatively certain that overlayfs is safe and it actually is in 99% of all cases but has issues in the remaining 1%, we would have a much easier time identifying those cases if all builds used the feature, not just those building under /tmp.

Feb 27 '24 09:02 fmeum

I don't want to take sides here (y'all have collectively way more brain cells engaged on this problem that I do), however, I would like to point out two things:

I don't think checking every output file is that problematic, because actions usually have very few output files and those are usually not symbolic links. Also, most of the time, doing what's essentially readlink -f with a twist one a file will be negligible compared to the effort of generating that file.
Are we sure we want to add a dependency on overlayfs? Sure, we can do feature detection, but between a very pedestrian "resolve symlinks" and depending on some fancy kernel feature, the former looks much less risky.

Feb 27 '24 09:02 lberki

Instead of resolving symlinks with a twist, could we perform the symlink resolution in the sandbox binary with the mounts still present? This would require passing the output file paths to the sandbox though.

Feb 27 '24 10:02 fmeum

I'd say that fixing symlinks being simple in most cases is not a strong argument. The current implementation of hermetic tmp also worked most of the time.

What if we don't have just one output and what if they are unpredictable because they are inside a tree artifact? We might fix this issue now and get it to work with extra code but I'd strongly argue in favor of simplifying all this to avoid having to fix bugs in this area as often as we are doing now. I think we should try the overlayfs solution first.

Feb 27 '24 10:02 oquenchil

I really don't like the complexity entailed by resolving symlinks; it's always more complicated than it seems.

What if, for example, an output is materialized as a chain of symlinks that jump back and forth between the source and output trees an arbitrary number of times? Should we preserve the full symlink chain, or just replace it with a symlink to the final target?

Similarly, what should we do with unresolved symlinks? Earlier in this thread, I claimed that we should copy the symlink out of the sandbox verbatim. I'm no longer sure that would work, because I've since learned that rules_js uses unresolved symlinks pointing to the source tree; I can't say I like that particular design decision, but we're likely stuck with supporting it. But, on the other hand, I'm not convinced that rewriting always works (even if it tolerates dangling symlinks).

If overlayfs can be made to work, I'd still prefer it (Lukács: we're already fully bought into Linuxisms thanks to filesystem namespaces; IMO adding overlayfs doesn't make things significantly worse). If we can't, I still think using the mount structure described https://github.com/bazelbuild/bazel/issues/21215#issuecomment-1962319349, as complex as it sounds, is better than resolving symlinks.

(Also, I fully agree with Fabian that, whichever solution we pick, we should use it unconditionally instead of special-casing "source root is under /tmp"; a seldom-used code path makes bugs more likely.)

Feb 27 '24 10:02 tjgq

Ok, so let's try overlayfs for both cases then. I will not be working the next 3 days. Can this wait till next week?

Feb 27 '24 14:02 oquenchil

What's the status of this?

We just cut 7.1.0rc2 today, which after a bit of baking should be good to release next Monday. But @fmeum notified me that this issue might be more than just a soft blocker.

@oquenchil are you still working on this? What is the timeframe you think it'll be ready by? If it's before EOW, I think we could fit another RC in, but otherwise I'd rather not delay 7.1.0 forever as Chrome OS has been asking for it.

Alternatively, Fabian mentioned that we could roll back some commits made in 7.0.0 ("@lberki made the flag flip work for builds with output base under /tmp, but that required some bind mount magic that is now haunting us"). What do y'all think about the feasibility of that?

Mar 07 '24 19:03 Wyverald

If this isn't an easy fix. I'd prefer we don't rush it and block the 7.1.0 release. We can make patch releases later for remaining fixes, and users do get affected can use --noincompatible_sandbox_hermetic_tmp to workaround?

Mar 08 '24 09:03 meteorcloudy

gentle ping -- we could still cherry-pick this into 7.1.1 (tentative release date next Wednesday) if it gets a fix soon, otherwise this will need to wait until 7.2.0.

Mar 15 '24 19:03 Wyverald

This can wait till 7.2.0.

Apr 01 '24 10:04 oquenchil

I took over from @oquenchil and have an overlayfs-based prototype that works pretty well - except when you have something mounted under /tmp. I don't yet know whether this can be worked around easily.

Apr 12 '24 17:04 fmeum

Hi all, is this still on track for 7.2? We're aiming to create the first RC on 5/13.

Apr 29 '24 15:04 keertk

Yes, I'm working on this and am positive that I'll have something ready in time.

May 01 '24 18:05 fmeum

bazel bazel copied to clipboard

`--incompatible_sandbox_hermetic_tmp` breaks outputs materialized as a symlinks to source artifacts

bazel
bazel copied to clipboard