rules_pkg icon indicating copy to clipboard operation
rules_pkg copied to clipboard

'File exists' error when building pkg_tar from srcs containing directories

Open cjmakes opened this issue 5 years ago • 17 comments

Description of the problem / feature request:

When building a pkg_tar from a filegroup defined in an external repository containing multiple files in a directory, PackgeTar throws an io exception because the directory already exists. However, the issue is not present when using --spawn_strategy=local.

Bugs: what's the simplest, easiest way to reproduce this bug.

See reproduction below.

What's the output of bazel info release? Happens with Bazel up through 4.x

Have you found anything relevant by searching the web?

  • I started at https://github.com/bazelbuild/rules_docker/issues/422 to find a workaround for renaming
  • Then filed: https://github.com/bazelbuild/rules_docker/issues/1007

Any other information, logs, or outputs that you want to share?

log.txt

Is there something that I am not understanding about how the rules should work or is this really a bug? If this is just not how the rules should work what is the solution?

cjmakes avatar Jul 18 '19 17:07 cjmakes

@laurentlb why is this assigned to team-Remote-Exec?

buchgr avatar Jul 29 '19 12:07 buchgr

the issue is not present when using --spawn_strategy=local.

It sounds like an error related to execution. I don't know which exact team should look at this. Feel free to reassign.

laurentlb avatar Jul 29 '19 12:07 laurentlb

@laurentlb isn't that for the dev ex team, of who you are a member of, to find out?

buchgr avatar Jul 29 '19 12:07 buchgr

Please do not drop issues on the floor - keep them assigned to some team.

dslomov avatar Aug 07 '19 10:08 dslomov

Assigning to team-Rules-Server for initial investigation/triage.

dslomov avatar Aug 07 '19 10:08 dslomov

This got lost in the bazel repo but we move pkg_tar to this repo last year. I'm going through the issue backlog.

Do you have a standalone repo for this, against a newer bazel than 0.27?

aiuto avatar Apr 17 '20 04:04 aiuto

Closing as obsolete

aiuto avatar Aug 07 '20 01:08 aiuto

This is still an issue. See https://github.com/ash2k/rules_pkg_bug.

$ bazel --version
bazel 4.0.0

$ bazel build --verbose_failures --sandbox_debug //:broken_package                                                      
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1564776078 -0400"
DEBUG: Repository rules_python instantiated at:
  /Users/mikhail/src/rules_pkg_bug/WORKSPACE:13:23: in <toplevel>
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/rules_pkg/deps.bzl:33:10: in rules_pkg_dependencies
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/utils.bzl:201:18: in maybe
Repository rule git_repository defined at:
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //:broken_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/mikhail/src/rules_pkg_bug/BUILD.bazel:12:8: Writing: bazel-out/darwin-fastbuild/bin/broken_package.tar failed: I/O exception during sandboxed execution: /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/dir1 (File exists)
Target //:broken_package failed to build
INFO: Elapsed time: 0.109s, Critical Path: 0.00s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

$ ls -la /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/    
total 0
drwxr-xr-x  5 mikhail  staff  160 Feb 15 20:45 .
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 ..
drwxr-xr-x  4 mikhail  staff  128 Feb 15 20:45 bazel-out
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 dir1
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 external

ash2k avatar Feb 15 '21 09:02 ash2k

Inlining the reproduction

filegroup(
    name = "files",
    srcs = glob(
        ["dir1/**"],
        exclude_directories = 0,
    ),
)

pkg_tar(
    name = "broken_package",
    srcs = [":files"],
)

pkg_tar(
    name = "working_package",
    srcs = glob(
       ["dir1/**"],
    ),
    strip_prefix = "dir1",
)
dir1/
  file1.txt
  dir2/
    file2.txt

I don't have an idea on the best fix right now. We're trying to get strip_prefix out of pkg_tar and add it to pkg_filegroup, so the shape of the usage is likely to be different.

aiuto avatar Feb 16 '21 14:02 aiuto

Before fixing this, let's agree on the expected output of broken_package.

I am going to assert that any previous behavior which flatten the paths returned by the glob is incorrect. The output, strip_prefix=dir1 or not should include:

  • dir1/file1.txt
  • dir1/dir2/file2.txt

So that is what I am working towards.

The snag I have hit is that the list of files in the glob appears to be broken. dir1 and dir1/dir2 are showing up as files with is_directory==False, instead of true. This is just broken. I'll have to investigate why.

aiuto avatar May 12 '21 18:05 aiuto

Any update on this?

I did hit this when packaging py_binary which uses python interpreter from rules_pyenv. Wasted 2 days with all sorts of workarounds without any success. --spawn_strategy=local provided tar that I can use, but then the other part of the project failed to build.

ref: https://github.com/digital-plumbers-union/rules_pyenv/issues/5

aisbaa avatar Jun 10 '21 08:06 aisbaa

I had a good discussion with the core team today.

The fundamental problem is that directories as action inputs is a bad idea in Bazel (see this piece of the documentation). The glob() is not important, we can reproduce a failure with something simple.

pkg_tar(
    name = "broken",
    srcs = [
        "dir1",
        "dir1/dir2/file2.txt",
    ]
)

This might work for local execution, but it won't work for sandboxed execution. This is not likely to change for a long time. So, the workaround is to never use exclude_directories=0 or provide raw directories as srcs to a pkg_* rule. Given the example tree above, you can do.

Choice 1: Let pkg_tar create the directories

Either of these work.

pkg_tar(
    name = "tree_glob",
    srcs = glob(
       ["dir1/**"],
    ),
    strip_prefix = '.',
)

filegroup(
    name = "dir_glob",
    srcs = glob(["dir1/**"]),
)

pkg_tar(
    name = "tree_files",
    srcs = [":dir_glob"],
    strip_prefix = '.',
)

Choice 2: If you need to specify modes of a dir, use pkg_mkdirs

In a few weeks, you will be able to do this.

pkg_mkdirs(
    name = "dirs",
    dirs = [
        "dir1/dir2",
    ],
    attributes = pkg_attributes(mode='751', user='sys', group='adm')
)

pkg_tar(
    name = "precise_layout",
    srcs = [
        ":dirs",
        ":dir_glob"
    ],
    strip_prefix = '.',
)

which yields.

$ tar tvf bazel-bin/precise_layout.tar
drwxr-xr-x sys/adm           0 1999-12-31 19:00 ./
drwxr-xr-x sys/adm           0 1999-12-31 19:00 ./dir1/
drwxr-x--x sys/adm           0 1999-12-31 19:00 ./dir1/dir2/
-r-xr-xr-x 0/0               6 1999-12-31 19:00 ./dir1/dir2/file2.txt
-r-xr-xr-x 0/0               6 1999-12-31 19:00 ./dir1/file1.txt

aiuto avatar Jun 10 '21 15:06 aiuto

Thank you, I've tested with exclude_directories=1 and it did solve File exists error. But I got hit by another fundamental issue link or target filename contains space. It is well described in https://github.com/bazelbuild/bazel/issues/374 and was open since 2015. I think I'll park the idea of using rules_pyenv with rules_pkg.

Thank you for in-depth explanation!

aisbaa avatar Jun 10 '21 18:06 aisbaa

In the next release of rules_pkg you might see better results with space in target names. I've changed the way the rule passes the manifest of what goes into the tarball to the helper that builds it. Now everything should be unicode clean and not have problems with escaping on the command line.
That said, I don't know your particular use case and the new features are a week or two away from being available.

aiuto avatar Jun 11 '21 01:06 aiuto

That could solve my issues, I'll make sure to test it when its out. Thank you for maintaining and developing rules_pkg!

My use case is fairly simple, I just want to use rules_pyenv so that CI would not have to re-download python during each build.

  • The down side of using rule_pyenv with pkg_tar, is that pkg_tar includes python interperter if include_runfiles is set to True. I don't need interpreter archived together with the code, so I've created genrule to repackage same tar without the interpreter.
  • However I still need include_runfiles = True, so that pkg_tar would find and include non-python files (f.e.: certifi/cacert.pem).

Having that said, I do believe that rules_python should have a way to expose all files, not just python files...

aisbaa avatar Jun 11 '21 08:06 aisbaa

Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).

  1. pure python code and data files, with a portable wrapper that invokes the system python
  2. python code with data and .so files, but now we have to be careful to compile the .so's for a particular target system, so this is not portable
  3. a fully, self-extracting binary with an embedded interpreter. Again, targeted for a particular system.

Case 1 is easy. Just pull in all the files Case 3 is equally easy, it is always a single file. Case 2 is hard, because people might want a combination of .so and .dll in the same package, so it can be more portable

aiuto avatar Jun 11 '21 15:06 aiuto

Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).

Could not agree more and I've discovered that they have rules for packaging (rule_python/docs/packaging.md) which might do it.

  1. pure python code and data files, with a portable wrapper that invokes the system python

That is actually part of py_binary. The down side is that it pulls interpreter when rules_pyenv is used.

aisbaa avatar Jun 15 '21 14:06 aisbaa