Don't write duplicate entries e.g. symlinks to tar file
Currently there are multiple entries for the same target file inside a rpm2tar result. Because of this, it isn't possible to build e.g. an oci_layer out of it. The symlinks should be add to the collectors map of written files and the collector should skip and write for these
Do you have a reproducer?
load("@bazeldnf//:deps.bzl", "rpmtree")
load("@rules_oci//oci:defs.bzl", "oci_image", "oci_load", "oci_push")
rpmtree(
name = "sandbox",
rpms = [
"@binutils-0__2.41-38.fc40.x86_64//rpm",
"@binutils-gold-0__2.41-38.fc40.x86_64//rpm",
],
symlinks = {
"/usr/bin/ld": "/usr/bin/ld.bfd",
},
visibility = ["//visibility:public"],
)
oci_image(
name = "sandbox_image",
base = "@distroless_base",
entrypoint = [],
tars = [
":sandbox",
],
visibility = ["//visibility:private"],
workdir = "/root",
)
oci_load(
name = "load",
image = ":sandbox_image",
repo_tags = ["foo"],
)
rpm(
name = "binutils-0__2.41-38.fc40.x86_64",
sha256 = "5dba5e8826c29a4b4d55fb506c9b6f929ded1e73259fce26630cf13f1f4d5715",
urls = [
"https://dl.fedoraproject.org/pub/fedora/linux/updates/40/Everything/x86_64/Packages/b/binutils-2.41-38.fc40.x86_64.rpm",
],
)
rpm(
name = "binutils-gold-0__2.41-38.fc40.x86_64",
sha256 = "02962db175354365a447c0cfd56c7f4902359dee3a4b302c9d123799b840f218",
urls = [
"https://dl.fedoraproject.org/pub/fedora/linux/updates/40/Everything/x86_64/Packages/b/binutils-gold-2.41-38.fc40.x86_64.rpm",
],
)
So the overlap is in usr/lib/.build-id that I think we should ignore anyway, how does the error manifest? What are your bazel calls that leads to the issue? Could you make a repro repo so we can test things?
rpm2tar writes its symlinks ( https://github.com/rmohr/bazeldnf/blob/main/cmd/rpm2tar.go#L64 ) before the actual files from all rpms. But because the collector doesn't know that the symlinks are written (they get directly add to the tarWriter), the original file is also added. e.g. /usr/bin/ld
If you try to run the :load target your docker daemon will complain that there are multiple entries
So ld comes from the symlinks you're passing into the rpmtree call and from 1 of the rpms (not from both). The only overlapping file is in the path I mentioned, I opened both rpms manually. Maybe for the symlink you should add a tar that creates the symlink in another layer, you will still be waisting space on the image as ld binary is still there. I would say to work on a fix the first thing we need is either a repro repository or an e2e test like the other ones we have.
So, I think we cover files. We fixed that as part of https://github.com/rmohr/bazeldnf/pull/49, seems like we need to do the pass in explicit symlink to the collector right away. #136 is probably enough. But need to add a test.