rules_pkg
rules_pkg copied to clipboard
pkg_tar should provide includes / excludes glob filter options
The pkg_tar rule should provide glob filter options in order to perform file filtering transforms on existing tar files. My particular use case is stripping out files when creating a Debian package from an existing binary distribution package like test case files and some documentation in go1.12.9.linux-amd64.tar.gz as an example.
Besides being useful for the rules in this package, it should also be useful in combination with rules_docker.
I'm not sure it belongs in pkg_tar or in the rules used to structure packages.
Can you provide an example of the kind of rule set you want to lay out? I can sort of guess you have something like this
pkg_tar( name = x deps = [":go1.12.9.linux-amd64.tar.gz"] exclude_patterns = [ "doc/.", "test/."], )
I think I would prefer doing
files_from_tar( name = "stripped_go", deps = [":go1.12.9.linux-amd64.tar.gz"] exclude_patterns = [ "doc/.", "test/."], strip_path = .... new_path = ... )
pkg_tar( name = x deps = [":stripped_go"] )
The reason for my approach is that I would also make a FilesFromTar provider that would include the full transform of the excludsion and path transforms, then use it from pkg_tar, pkg_zip, pkg, deb, pkg_rpm. I am always going to prefer generic solutions that mix into any package type.
Yes, agree that a more generic solution would be preferable. I could see something like files_from_tar being useful here and also other contexts. My example is basically as you describe in the :x target.
One possible motivation to build filtering into the library that backs the all the rules is that it seems like it would skip putting files on disk. like the ones in :stripped_go above, that would not be referenced outside of building another package -- but I don't have any specific use cases where that matters for me.
@aiuto, I'm still interested in this, but I don't think my original request of transforming an existing tar input is as important as just providing includes and excludes on the items in srcs. Your proposal of a files_from_tar rule would still be useful of just extracting the archive into a filegroup for further processing.
For example given a @npm_prod repo which was generated from a yarn_install, I currently need to do something like the following just to filter out some undesired files when packaging up a NextJS app when using rules_docker that are unavoidable even when using --production arg for yarn_install:
# Package with destination directory to avoid possible dot dir flattening.
# https://github.com/bazelbuild/rules_docker/issues/1974
pkg_tar(
name = "app-node-modules-unfiltered",
srcs = [
"@npm_prod//:node_modules",
],
# Remap external repo files.
# https://github.com/bazelbuild/rules_pkg/issues/251#issuecomment-499464418
remap_paths = {
"../npm_prod/": "/app/",
},
# Don't flatten directories.
# https://github.com/bazelbuild/rules_docker/issues/317
strip_prefix = "/",
)
# Package up node_modules ready to go in the container /app directory
genrule(
name = "app-node-modules",
srcs = [
":app-node-modules-unfiltered",
],
outs = ["app-node-modules.tar"],
cmd = """
TMP=$$(mktemp -d || mktemp -d -t bazel-tmp)
trap "rm -rf $$TMP" EXIT
tar -C $$TMP -xf $(location :app-node-modules-unfiltered)
tar -C $$TMP \\
--exclude='app/node_modules/@next/swc-linux-x64-gnu' \\
--exclude='app/node_modules/@next/swc-linux-x64-musl' \\
-hcf $(location app-node-modules.tar) app
"""
)
# Use app-node-modules.tar in tars of a container_image rule...
Or alternatively:
genrule(
name = "node-modules",
srcs = [
"@npm_prod//:node_modules",
],
outs = ["node-modules.tar"],
cmd = """
TMP=$$(mktemp -d || mktemp -d -t bazel-tmp)
trap "rm -rf $$TMP" EXIT
tar -C external/npm_prod \\
--exclude='node_modules/@next/swc-linux-x64-gnu' \\
--exclude='node_modules/@next/swc-linux-x64-musl' \\
-hcf $(location node-modules.tar) node_modules
"""
)
container_layer(
name = "app-node-modules-layer",
directory = "/app",
tars = [
":node-modules"
],
)
It would be really nice to not use a genrule step since one needs to be aware of portability, e.g., macOS's tar does not have --transform to easily create something like an /app prefix. Having includes and excludes also matches up with the expectation of basic GNU tar functionality.
I'd like to see something like:
pkg_tar(
name = "app-node-modules",
srcs = [
"@npm_prod//:node_modules",
],
# Remap external repo files.
# https://github.com/bazelbuild/rules_pkg/issues/251#issuecomment-499464418
remap_paths = {
"../npm_prod/": "/app/",
},
# Don't flatten directories.
# https://github.com/bazelbuild/rules_docker/issues/317
strip_prefix = "/",
# Remove development files based on the remapped paths
excludes = glob([
"app/node_modules/@next/swc*/**",
]),
)
In terms of implementation, would filtering content_map before writing the manifest technically work? It looks like if an entry isn't in the manifest, then it isn't going to be added to the output.
But to make it work for all archive types, maybe just pass some includes and excludes filter parameters to methods like add_single_file in pkg/private/pkg_files.bzl?
Filtering before writing the manifest might be sufficient. But I think your example with a genrule is more flexible. The idiom is that we would filter in a pkg_files (or expand_tar) rule between the input archive (.tar, .npm, ...) and the final packaging. That would allow for fine grained filtering and remapping from multiple input sources.
One thing to consider is that remap_paths from pkg_tar is going away. It is a confusing piece of technical debt in the code so are goal is to eliminate it. Instead, pkg_files can strip out existing paths and rebase them with new ones. I am presuming from your examples that you need to remap paths, the eventual way to write that is with an intermediate pkg_file target to do that remapping. Adding the filtering to that target is equivalent to doing it in pkg_tar.
I still have to think about how either solution will compose efficiently. The ideal state is that if you ask for
existing .tar, .npm, .jar, ... => rules that remap paths and filter files => pkg_{zip, rpm, tar, deb, cab}
Then the series of actions we produce will not expand the .tar into individual files to send to the final packaging. Instead, we would ship the original .tar to the final packaging action and the build_tar helper would act on instructions from the manifest (or similar) to unarchive and repackage. I would also like to limit the number of times we have to scan the input archive.
I can make this fairly efficient for deb X tar X zip, since I can unpack inputs from the same process as the one writing the final archive. rpm is different because we need to use an external writer (rpmbuild). The thing in my mind right now is that tar_expand could be a macro that produces two targets, one is just a provider of remap and filter information, the other is a tree artifact of the untarred (and filtered) tar file. A pkg_rpm rule would have to take the tree artifact as srcs. That would avoid the unfurling when not needed, but would be ugly because the pkg_rpm instance would have to use the name of the artifact target.
New thought for an API.
pkg_filter(name, srcs, include_filters, exclude_filters, strip_prefixes, prefix)
include_filters: list of regexes of input paths to accept. if not provided, accept everything
exclude_filters: list of regexes of paths to exclude. done after accepting them, so you can be very inclusive at first and exclude later
remap: ordered list of pairs of anchored regexes -> string: after include and exclude is done, if the path begins with any of the regexes, replace with the designated string and stop processing.
prefix: a single output prefix to prepend to all output paths
I've got it over here on a fork https://github.com/bazelbuild/rules_pkg/compare/main...aspect-forks:rules_pkg:main
@alexeagle your fork is exactly what I'm looking for.
Is there any chance this will be merged?
@matt-sm When the tar manifest is a public API, this is trivially done with any rule that modifies the manifest file. Since we have a plain BSD tar/mtree rule https://github.com/aspect-build/bazel-lib/blob/main/docs/tar.md I don't plan to spend time on that here in rules_pkg and am just migrating rules I bump into to use that instead.