rules_pkg
rules_pkg copied to clipboard
No option to preserve symlinks in pkg_tar
The current behavior of pkg_tar
is that if I have a file and a symlink to that file, and both are given as sources, the symlink is "expanded" and two copies of the original file are added to the repo.
For some context, my use case is that I'm trying to package an external toolchain's libc into a tar for use in Docker containers (cross-compiling is... fun). As you can imagine, this is massively ballooning the size of my images.
While I could certainly attempt to do something like filtering all the links out and then generating a symlinks
dict to pass in, I was wondering if there would be interest in adding a flag to allow this behavior in the main rule. I think it could be useful in a variety of contexts, especially in generating library debs for use by others.
The behavior as I imagine it would be along the lines of passing a preserve_symlinks
flag, which would then include symlinks if they only point to files in the archive. This validation should be simple enough, and would remove risks of non-hermeticity. Alternatively, symlinks pointing elsewhere could follow the current behavior, while hermetic symlinks could be preserved.
Please let me know if there are any concerns I'm missing, or if there's an obvious solution to my problem that I missed somewhere in the docs.
Clearly a missing feature. I would like to see this in pkgfilegroup and reused across all rules.
In case someone needs this and is fine with a workaround until this is properly fixed here, we wrote a Starlark wrapper that invokes pkg_tar
with an appropriate flagfile to create the symlinks:
https://github.com/stratum/stratum/blob/3e8f0cdf1abdbdf5458b802058c8413084884420/bazel/rules/package_rule.bzl#L12-L47
This should be doable now with pkg_mklink, but it won't do the auto-magic expand. I think that feature is not that useful. In practice, what you usually want in your package is something like
- a binary that is the result of a rule, and you want to place that somewhere like /usr/share/MyApp/bin/myapp
- a symlink from /usr/bin/myapp to /usr/share/MyApp/bin/myapp
Bazel doesn't really deal with symlinks as a differentiable first-class file type that well, so it would be excessive engineering here to do that unless that was added.
@aiuto I'm using aspect-build/rules_js, a new high performance alternative/better approach to build_bazel_rules_nodejs
and it uses pnpm that heavily depends on symlinks. Unfortunately in Node.js it's not about building just a binary as you mentioned, there is a node_modules folder structure where the symlinks need to be preserved otherwise the modules resolution won't work. I think that feature is pretty useful.
Can you provide a detailed writeup (preferablye in an external doc) about what rules_js does, and then what you need to accomplish. I won't have the time to do this research myself so I have to rely on support from the community to gather requirements for inputs we need to accept and final package layouts we want to achieve. Once we get a good set of use cases, I'm sure I can come up with an implementation plan.
https://pnpm.io/symlinked-node-modules-structure is the detailed writeup about the semantics required. The tar file needs to contain a node_modules
directory with symlinks into a CAS "virtual store" as described there.
So we just need exactly what the OP requests: preserve the symlinks in the tar file so that they survive the trip into a docker image and then into the container runtime.
@aiuto this is now a significant blocker for rules_nodejs users ready to migrate to rules_js. Was that reply sufficient for you to accept a PR to add the preserve_symlinks
option?
Just to chime in, I would also appreciate this working in pkg_tar. The use case is similar, I want to preserve file structures for/from toolchains. In my particular case the toolchain produces many aliases of large libraries as symlinks (think .so
.so.1
.so.1.0.0
etc). And when there are 10 aliases of a 100MB binary it is quite impactful.
I'm ooo until July 7. Will pick up thead then.
On Thu, Jun 23, 2022, 12:20 AM Kurt K @.***> wrote:
Just to chime in, I would also appreciate this working in pkg_tar. The use case is similar, I want to preserve file structures for/from toolchains. In my particular case the toolchain produces many aliases of large libraries as symlinks (think .so .so.1 .so.1.0.0 etc). And when there are 10 aliases of a 100MB binary it is quite impactful.
— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_pkg/issues/115#issuecomment-1163754137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXHHHEOJODAIRYHJW4RQQTVQONUBANCNFSM4JLM6PRA . You are receiving this because you were mentioned.Message ID: @.***>
👋🏼 @aiuto wondering if you had a chance to pick up the thread. we're looking into moving to rules_js, and this is the one thing (that we know) is blocking us. thanks!
OK. I am in a mood to deal with this now.
I presume we can have dangling, arbitrary symlinks, so that is not the problem. My hunch is that you can make the structure you want with pkg_files, but then it goes bad (where bad means messing up relative symlinks) in one of several possible places.
- pkg_tar consuming the symlink tree
- pkg_tar consuming tar via
deps
. - pkg_deb consuming pkg_tar (not likely, all we do is read it and blatz it back out)
- docker rules consuming pkg_tar/pkg_deb. Again, probably not because the issue is here, not there.
While we are at it..... the spec uses <store>
a lot. Is there a need for something like parameterized output path rewrite?
Is #603 the right kind of test case?
I need help understanding the requirement here. Please look at #603. It
- creates a tree of symlnks that looks like node_modules
- passes that as
srcs
input to pkg_tar - passes that tarball as
deps
input to a second pkg_tar
It still comes out right
tar tvf bazel-bin/tests/tar/relative_links_re_tar.tar
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/bar -> STORE/bar
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> ../../[email protected]/node_modules/qar
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/bar -> ../../[email protected]/node_modules/bar
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/foo -> STORE/foo
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> ../../[email protected]/node_modules/qar
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> STORE/qar
lr-xr-xr-x 0/0 0 1999-12-31 19:00 node_modules/foo -> .pnpm/[email protected]/node_modules/foo
What I think I am missing is how you build your input sources.
Or..... maybe I am missing the obvious. Is the requirement:
Real check in files, or the result of a tree artifact.
- bin/foo
- foo -> bin/foo
- bar -> ../../bar
pkg_tar(srcs=glob("[**]*), ...)
And that does not do the right thing for the two links?
... and the answer is yes. The current version of #603 flattens the symlinks to the content they point to, so I have a valid test case now. The question is how I want the API to look. What I really do not want is to implement in pkg_tar and then realize I needed it in pkg_files anyway.
I was wondering something similar; it really depends on what rules_js
is doing. The need to package everything up as a tar is also unclear to me.
In any case, it may not be possible (or convenient) to emit all the symlink providers.
@aiuto findings are true. the underlying problem is that write_manifest
constructs misleading manifests and build_tar blindly tries to build up the tar and fails.
image a rule like this
def incorrect_impl(ctx):
actual_file = ctx.actions.declare_file("thefile.txt")
symlink = ctx.actions.declare_file("this_is_a_symlink.txt")
ctx.actions.symlink(symlink, target_file = actual_file)
return DefaultInfo(files = depset([actual_file, symlink]))
pkg_tar will add two manifest entries;
- file entry for thefile.txt
- file entry for this_is_a_symlink.txt
but I don't think the problem is supposed to be fixed here in rules_pkg since there is no way pkg_tar could tell whether a file is a symlink or not.
the principled fix would be that bazel exposes two new fields to FileApi
named is_symlink
and symlink_target
which points to another File
also, pkg_mklink
is not helpful as there is a huge symlink forest for node_modules.
the principled fix would be that bazel exposes two new fields to FileApi named is_symlink and symlink_target which points to another File
Yes. That really should happen. But that would be Bazel 7.x at this point, so I do want to come up with things I can do in the interim.
-
pkg_mklink
is not so bad of the the forest can be built incrementally - we can probably do something in scanning tree artifacts to turn files into links
- we can probably detect that input files are links or not, but that might not be compatible with remote build.
I have already started on a fix that checks if any entry#[2]
within the manifest is a symlink, if so then does a realpath on it then strips out execroot/wksp
(to make it bazel-out/cfg/path/to
) portion from it then does a lookup within the manifest to find the corresponding file. if found then rewrites the entry to be a symlink that points to the target file with its path in final tar.
I'd like to put up a PR for this to fix it in rules_pkg.
- we can probably detect that input files are links or not, but that might not be compatible with remote build.
https://github.com/bazelbuild/bazel/pull/15866 is working towards fixing symlinks with RBE
@thesayyn Can you take a look at #603 to confirm that //tests/tar:relative_links_tar reproduces the problem here.
If you were to patch it and do:
blaze build //tests/tar:relative_links_tar
tar tvf bazel-bin/tests/tar/relative_links_tar.tar
you would see
$ tar tvf bazel-bin/tests/tar/relative_links_tar.tar
-r-xr-xr-x 0/0 3993 1999-12-31 19:00 BUILD
drwxr-xr-x 0/0 0 1999-12-31 19:00 node_modules/
...
-r-xr-xr-x 0/0 3993 1999-12-31 19:00 outer_BUILD
This is correct for some people, and wrong for others. Since the native file is a symlink, they may have wanted outer_BUILD
to simply be ../BUILD
.
You say you are working on a "fix". I don't think a single fix is what we need. There has to be a way to control the behavior. And, I think it has to be in pkg_files rather than pkg_tar and pkg_zip individually. In some places we want to preserve links, but we might mix that with places where we follow them. Do you have a WIP PR?
More thinking about this.
- I believe that symlinks in tree artifacts should be preserved in pkg_tar or pkg_zip by default.
- There is no need to implement an expand links option in pkg_tar or zip today.
- We absolutely do need a way to instantiate or preserve links in source files and filegroup(). To make that most flexible, it should be at the pkg_files level rather than in pkg_zip.
@thesayyn @alexeagle I don't want to wait for https://github.com/bazelbuild/bazel/pull/15866 and bazel 6.x to make progress on this.
If you think the test cases in #603 cover enough of the space here, I will merge that PR as a baseline to work against and then continue a PR for this using those as a test case.
sorry for not replying here after the ping! i honestly don't know enough to answer some of the questions, but maybe @alexeagle might?
@thesayyn is the most up-to-date. My impression since we landed https://github.com/aspect-build/rules_js/tree/main/e2e/js_image is that this issue is lower-severity.
OK. I can still do some fixing on this now while I am trying to prep a 0.0.8 release. If #603 has good test cases, I can use that as a basis for a PR that does something useful.
Hi all! I wanted to chime in to say that this issue also affects my team and we'd love to see a fix. I just don't want to see the momentum get lost after the recent comment saying that this issue may be lower-severity!
Our usage essentially boils down to:
pkg_files(
name = "install_files",
srcs = glob(
include = [
# some things with symlinks, such as:
# lib.so -> lib.so.6
# lib.so.6 -> lib.so.6.2.3
# lib.so.6.2.3
],
),
)
pkg_tar(
name = "config",
srcs = [
":install_files",
],
)
Thanks!
The issue is still important to me, but I do need help in the form I have asked for. I need people to tell me if the test cases I have created in #603 (?) really do capture the use case. If I have that right, making the change is an easy part. I'm not going to add features without a comprehensive set of tests that capture real requirements.
On Wed, Aug 10, 2022 at 12:40 AM Josh Widen @.***> wrote:
Hi all! I wanted to chime in to say that this issue also affects my team and we'd love to see a fix. I just don't want to see the momentum get lost after the recent comment saying that this issue may be lower-severity!
Our usage essentially boils down to:
pkg_files( name = "install_files", srcs = glob( include = [ # some things with symlinks, such as: # lib.so -> lib.so.6 # lib.so.6 -> lib.so.6.2.3 # lib.so.6.2.3 ], ), )
pkg_tar( name = "config", srcs = [ ":install_files", ], )
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_pkg/issues/115#issuecomment-1209962938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXHHHFTVV42K2TOKOL3SKDVYLM5TANCNFSM4JLM6PRA . You are receiving this because you were mentioned.Message ID: @.***>
I need people to tell me if the test cases I have created in #603 (?)
Yes. Though this is only one part of the symlinks issue with pkg_tar
. I have created https://github.com/bazelbuild/rules_pkg/pull/609 as well to demonstrate the issue with the node_module tree that rules_js lays out.
OK. I'm unexpectedly tied up this week and can't do deep thinking but I can get people to review the old PRs so we can improve the baseline test cases.