remote-apis icon indicating copy to clipboard operation
remote-apis copied to clipboard

Lack of description on what the permissions on files in the input root are

Open EdSchouten opened this issue 6 years ago • 5 comments

(Note: This is a continuation of a remark I made in #40)

It seems to be the case that the protocol doesn't document what the permissions of files and directories in the input root are in relation to the credentials of the build action.

  • Are build actions permitted to overwrite input files? When using Sandboxfs, this would be easy to achieve. Without using Sandboxfs, it's also possible, but has the downside that optimizations built around caching input files and hardlinking them is out of the question.
  • Related to the previous question, is it permitted to hardlink input files to some other location? Recent versions of Linux have fs.protected_hardlinks enabled by default. This implies that if input files are read-only (due to the use of hardlinking caches), we cannot guarantee that the kernel will allow the creation of hardlinks to input files.
  • In the general sense: is there even any guarantee that hardlinks can be created between two distinct directories in the input root? For example, may a worker place the output directories on a separate file system (tmpfs)? If so, this means you can't hardlink input files into an output directory.
  • What are permissions on directories? Where may the build action create temporary files? In any directory in the input root, or only inside of a directory containing one or more outputs?

Buildbarn's workers don't use anything like FUSE (yet!), for the reason that I initially aimed at using Buildbarn on Kubernetes, where you can't simply make mounts inside of containers. After giving it enough tweaks, I eventually concluded that:

  • All directories in the input root need to be writable to appease the build rules out there. Build rules should be allowed to rename and remove any file in the input root, and to create files in any directory in the input root.
  • Input files may be read-only. They may be replaced by removing them first and creating a new one with the old name.
  • Disallowing input files to be hardlinked (fs.protected_hardlinks == 1) causes a very small number of build rules to break, but those may be easy to fix.

Do we want to document this in the .proto file somewhere?

EdSchouten avatar Dec 23 '18 12:12 EdSchouten

Friendly ping? :-)

EdSchouten avatar Jan 19 '19 21:01 EdSchouten

Does https://github.com/bazelbuild/remote-apis/issues/90 fully address this, or is there anything remaining?

ola-rozenfeld avatar Oct 22 '19 14:10 ola-rozenfeld

Does #90 fully address this, or is there anything remaining?

We still need to document the default permissions, i.e., when UnixMode is not specified as node property.

juergbi avatar Oct 22 '19 14:10 juergbi

I don't think we can document the default permissions just yet, for the simple reason that there isn't currently an agreed-upon default. The tradeoffs you outline above are a good example: I agree that directories need to be writeable, but I'm not sure what will/won't break if we make files readonly (today RBE marks all input files as writeable; no actions modify their inputs today, but I don't know if any are opening them readwrite and so would be broken by readonly - wouldn't put it past random tools/scripts tools to do so). At the same time, to document that files must be writeable precludes you from some reasonable optimizations, so I'm not going to push for it.

If we do document a default, my vote is for '777' for the simple reason that that's what we use today, and I don't have a viable way to validate whether changing it would break anyone. If you (and other implementations) are interested in standardizing on the same, then I certainly won't object, but otherwise I think it's probably a case where we should declare it to be "implementation-defined" and leave it to tests of cross-compatibility to flag if the divergence actually matters to anything.

For #90 there's an independent concern that defaults may be context-specific: the default permission for inputs should probably be different from the default permission of outputs?

EricBurnett avatar Nov 05 '19 20:11 EricBurnett

If we do document a default, my vote is for '777' for the simple reason that that's what we use today, and I don't have a viable way to validate whether changing it would break anyone. If you (and other implementations) are interested in standardizing on the same, then I certainly won't object, but otherwise I think it's probably a case where we should declare it to be "implementation-defined" and leave it to tests of cross-compatibility to flag if the divergence actually matters to anything.

If these are the only two options, we have to keep it implementation-defined for now. BuildBox (and as far as I know also Buildbarn) exposes cached files to action commands via hardlinks by default. To prevent corruption of the cached files, these hardlinks have to be read-only to the process/user executing the action commands.

To my knowledge, this is the only cross-platform approach that avoids having to copy each input file for every action. Some platforms provide more flexible options such as FUSE or reflinks but these options are not universally available.

For #90 there's an independent concern that defaults may be context-specific: the default permission for inputs should probably be different from the default permission of outputs?

Do you mean for the special case of output paths that already exist in the input tree? As I normally expect outputs to be created by the action command, in which case the permissions are defined by the action command (+ possibly umask) and not by the RE server/worker. Or am I misunderstanding your question?

juergbi avatar Nov 06 '19 07:11 juergbi