Refactor mtree_spec to vis encode filenames
This PR refactors the mtree_spec rule:
Instead of emitting the .spec output file directly in it's final formatted form, it has been changed to emit a jsonl equivalent. This intermediate file is then processed line-by-line in a shell script with jq to assist with parsing. The filenames (and content) are encoded with vis prior to final mtree line formatting. A golden test has been added to e2e/smoke.
NOTE: this PR assumes the
vistool is present on the host machine (this is generally true for macos and linux). A future improvement could bundle the vis implementation fromgo-mtree.
Fixes #794
We are trying hard to make our bsdtar more hermetic (likely something like https://github.com/bazelbuild/bazel-central-registry/pull/1662 with @dzbarsky ) so I'm pretty sad to regress that by assuming vis is on the host machine. We have a lot of users under nix or in a distroless/wolfi image that has a stripped down distro.
Sure. I intended this PR as a workable example that allows me to move forward, and possibly as a base for someone from your team to continue with. Maybe include go-mtree as a dependency and build/distribute pre-built binaries? Alternatively, a simple but dedicated C/C++ program using libarchive that gets built for the target architecture?
I'm also not clear why it's beneficial to encode into jsonl as a transport between starlark and the multi-line run_shell command - can't the latter read data from the struct directly?
Yeah there certainly many possible ways to pass the information into a separate subprocess that generates the final mtree output. I also wasn't totally clear on the code that generates the lines (first iterates through ctx.files.srcs, then the DefaultInfo of each one in ctx.attr.srcs (it is not redundant?)). In any case, I didn't want to mess with that so I made a minimal change to generate something (JSON) that I had a parser for (jq) without including additional tool dependencies (and without slurping everything into memory at once since these files can be huge). As long as the final output is correct, I of course don't care how it's done, feel free to revamp it.
Thanks Paul - I'm not sure when I'll get time to work on this, but I imagine you can already patch in this change in the spot you need it, so you're not blocked on us merging and releasing?
Tried this out locally on my linux machine (stock Ubuntu 22.04.4 LTS), and I don't have vis installed. So I think we need a hermetic binary we can fetch and run.
@dzbarsky do you have some clever ideas where we should get it?
I just ran into a nodejs dependency where the directory name included a space in it that is not covered by #835. I am using bazel_dep(name = "aspect_bazel_lib", version = "2.7.8")
The line in my mtree spec is
src/typescript/tours-api/tours-api.sh.runfiles/npm/node_modules/thread-stream/test/dir with spaces uid=0 gid=0 time=1672560000 mode=0755 type=dir
I had claude generate this genrule to cover it. So far tested on mac, need to test on linux still but its a simple awk script so i'm hopeful shelling to the host awk wont be a problem
native.genrule(
name = name + ".mf-sanitized",
srcs = [name + ".mf"],
outs = [name + ".mf-sanitized.spec"],
cmd = """
awk '
BEGIN {FS="uid="; OFS=""}
{
gsub(/ /, "\\\\040", $$1)
sub(/\\\\040$$/, " ", $$1)
print $$1 "uid=" $$2
}
' "$(location """ + name + """.mf)" > "$@"
""",
)
I just ran into a nodejs dependency where the directory name included a space in it that is not covered by #835
The line in my mtree spec is
src/typescript/tours-api/tours-api.sh.runfiles/npm/node_modules/thread-stream/test/dir with spaces uid=0 gid=0 time=1672560000 mode=0755 type=dir
I had claude generate this genrule to cover it. So far tested on mac, need to test on linux still but its a simple awk script so i'm hopeful shelling to the host awk wont be a problem
native.genrule(
name = name + ".mf-sanitized",
srcs = [name + ".mf"],
outs = [name + ".mf-sanitized.spec"],
cmd = """
awk '
BEGIN {FS="uid="; OFS=""}
{
gsub(/ /, "\\\\040", $$1)
sub(/\\\\040$$/, " ", $$1)
print $$1 "uid=" $$2
}
' "$(location """ + name + """.mf)" > "$@"
""",
)