rules_pkg
rules_pkg copied to clipboard
'File exists' error when building pkg_tar from srcs containing directories
Description of the problem / feature request:
When building a pkg_tar from a filegroup defined in an external repository containing multiple files in a directory, PackgeTar throws an io exception because the directory already exists. However, the issue is not present when using --spawn_strategy=local
.
Bugs: what's the simplest, easiest way to reproduce this bug.
See reproduction below.
What's the output of bazel info release
? Happens with Bazel up through 4.x
Have you found anything relevant by searching the web?
- I started at https://github.com/bazelbuild/rules_docker/issues/422 to find a workaround for renaming
- Then filed: https://github.com/bazelbuild/rules_docker/issues/1007
Any other information, logs, or outputs that you want to share?
Is there something that I am not understanding about how the rules should work or is this really a bug? If this is just not how the rules should work what is the solution?
@laurentlb why is this assigned to team-Remote-Exec?
the issue is not present when using --spawn_strategy=local.
It sounds like an error related to execution. I don't know which exact team should look at this. Feel free to reassign.
@laurentlb isn't that for the dev ex team, of who you are a member of, to find out?
Please do not drop issues on the floor - keep them assigned to some team.
Assigning to team-Rules-Server for initial investigation/triage.
This got lost in the bazel repo but we move pkg_tar to this repo last year. I'm going through the issue backlog.
Do you have a standalone repo for this, against a newer bazel than 0.27?
Closing as obsolete
This is still an issue. See https://github.com/ash2k/rules_pkg_bug.
$ bazel --version
bazel 4.0.0
$ bazel build --verbose_failures --sandbox_debug //:broken_package
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1564776078 -0400"
DEBUG: Repository rules_python instantiated at:
/Users/mikhail/src/rules_pkg_bug/WORKSPACE:13:23: in <toplevel>
/private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/rules_pkg/deps.bzl:33:10: in rules_pkg_dependencies
/private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/utils.bzl:201:18: in maybe
Repository rule git_repository defined at:
/private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //:broken_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/mikhail/src/rules_pkg_bug/BUILD.bazel:12:8: Writing: bazel-out/darwin-fastbuild/bin/broken_package.tar failed: I/O exception during sandboxed execution: /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/dir1 (File exists)
Target //:broken_package failed to build
INFO: Elapsed time: 0.109s, Critical Path: 0.00s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
$ ls -la /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/
total 0
drwxr-xr-x 5 mikhail staff 160 Feb 15 20:45 .
drwxr-xr-x 3 mikhail staff 96 Feb 15 20:45 ..
drwxr-xr-x 4 mikhail staff 128 Feb 15 20:45 bazel-out
drwxr-xr-x 3 mikhail staff 96 Feb 15 20:45 dir1
drwxr-xr-x 3 mikhail staff 96 Feb 15 20:45 external
Inlining the reproduction
filegroup(
name = "files",
srcs = glob(
["dir1/**"],
exclude_directories = 0,
),
)
pkg_tar(
name = "broken_package",
srcs = [":files"],
)
pkg_tar(
name = "working_package",
srcs = glob(
["dir1/**"],
),
strip_prefix = "dir1",
)
dir1/
file1.txt
dir2/
file2.txt
I don't have an idea on the best fix right now. We're trying to get strip_prefix out of pkg_tar and add it to pkg_filegroup, so the shape of the usage is likely to be different.
Before fixing this, let's agree on the expected output of broken_package.
I am going to assert that any previous behavior which flatten the paths returned by the glob is incorrect. The output, strip_prefix=dir1 or not should include:
- dir1/file1.txt
- dir1/dir2/file2.txt
So that is what I am working towards.
The snag I have hit is that the list of files in the glob appears to be broken. dir1 and dir1/dir2 are showing up as files with is_directory==False, instead of true. This is just broken. I'll have to investigate why.
Any update on this?
I did hit this when packaging py_binary
which uses python interpreter from rules_pyenv
. Wasted 2 days with all sorts of workarounds without any success. --spawn_strategy=local
provided tar that I can use, but then the other part of the project failed to build.
ref: https://github.com/digital-plumbers-union/rules_pyenv/issues/5
I had a good discussion with the core team today.
The fundamental problem is that directories as action inputs is a bad idea in Bazel (see this piece of the documentation). The glob()
is not important, we can reproduce a failure with something simple.
pkg_tar(
name = "broken",
srcs = [
"dir1",
"dir1/dir2/file2.txt",
]
)
This might work for local execution, but it won't work for sandboxed execution. This is not likely to change for a long time.
So, the workaround is to never use exclude_directories=0
or provide raw directories as srcs
to a pkg_* rule.
Given the example tree above, you can do.
Choice 1: Let pkg_tar
create the directories
Either of these work.
pkg_tar(
name = "tree_glob",
srcs = glob(
["dir1/**"],
),
strip_prefix = '.',
)
filegroup(
name = "dir_glob",
srcs = glob(["dir1/**"]),
)
pkg_tar(
name = "tree_files",
srcs = [":dir_glob"],
strip_prefix = '.',
)
Choice 2: If you need to specify modes of a dir, use pkg_mkdirs
In a few weeks, you will be able to do this.
pkg_mkdirs(
name = "dirs",
dirs = [
"dir1/dir2",
],
attributes = pkg_attributes(mode='751', user='sys', group='adm')
)
pkg_tar(
name = "precise_layout",
srcs = [
":dirs",
":dir_glob"
],
strip_prefix = '.',
)
which yields.
$ tar tvf bazel-bin/precise_layout.tar
drwxr-xr-x sys/adm 0 1999-12-31 19:00 ./
drwxr-xr-x sys/adm 0 1999-12-31 19:00 ./dir1/
drwxr-x--x sys/adm 0 1999-12-31 19:00 ./dir1/dir2/
-r-xr-xr-x 0/0 6 1999-12-31 19:00 ./dir1/dir2/file2.txt
-r-xr-xr-x 0/0 6 1999-12-31 19:00 ./dir1/file1.txt
Thank you, I've tested with exclude_directories=1
and it did solve File exists
error. But I got hit by another fundamental issue link or target filename contains space
. It is well described in https://github.com/bazelbuild/bazel/issues/374 and was open since 2015. I think I'll park the idea of using rules_pyenv
with rules_pkg
.
Thank you for in-depth explanation!
In the next release of rules_pkg you might see better results with space in target names. I've changed the way the rule passes the manifest of what goes into the tarball to the helper that builds it. Now everything should be unicode clean and not have problems with escaping on the command line.
That said, I don't know your particular use case and the new features are a week or two away from being available.
That could solve my issues, I'll make sure to test it when its out. Thank you for maintaining and developing rules_pkg
!
My use case is fairly simple, I just want to use rules_pyenv
so that CI would not have to re-download python during each build.
- The down side of using
rule_pyenv
withpkg_tar
, is thatpkg_tar
includes python interperter ifinclude_runfiles
is set toTrue
. I don't need interpreter archived together with the code, so I've createdgenrule
to repackage same tar without the interpreter. - However I still need
include_runfiles = True
, so thatpkg_tar
would find and include non-python files (f.e.: certifi/cacert.pem).
Having that said, I do believe that rules_python
should have a way to expose all files, not just python files...
Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).
- pure python code and data files, with a portable wrapper that invokes the system python
- python code with data and .so files, but now we have to be careful to compile the .so's for a particular target system, so this is not portable
- a fully, self-extracting binary with an embedded interpreter. Again, targeted for a particular system.
Case 1 is easy. Just pull in all the files Case 3 is equally easy, it is always a single file. Case 2 is hard, because people might want a combination of .so and .dll in the same package, so it can be more portable
Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).
Could not agree more and I've discovered that they have rules for packaging (rule_python/docs/packaging.md) which might do it.
- pure python code and data files, with a portable wrapper that invokes the system python
That is actually part of py_binary
. The down side is that it pulls interpreter when rules_pyenv
is used.