Allow generator url file to be compressed.
This should be considered an RFC. This is an attempt to allow a compressed file to be specified for the generator url. I'm not a big fan of the current form of the patch since it only supports .tar.xz. Could we maybe use a .tar.xz that only has the compressed binary in it?
FWIW, this reduces download sizes, a Mac OS binary from 30M to 4M and a linux binary from 10M to 3M.
This seems pretty useful! It seems like adding more extensions later would be desirable, but I could start making use of this right away, even if it's limited to .xz
I think it might be better if we say that the generator_url must point to an archive that has a cargo-bazel file at the root that is executable. I am not sure if tar archives extracted keep the exec bits of the files extracted. Also, I have no idea how this would work on Windows.
I think it might be better if we say that the generator_url must point to an archive that has a cargo-bazel file at the root that is executable. I am not sure if tar archives extracted keep the exec bits of the files extracted. Also, I have no idea how this would work on Windows.
I think this might be better if made the generator_urls point to archives that have an executable file at a known location inside them. I don't know if download_and_extract will keep the exec bit on a file that is tarred with that perm. I also don't know how this would work in windows.
Okay, so I have changed to using the tar directly with download_and_extract. This is better. However, there is still a problem in that it only support .tar.xz files because I am hijacking the generator_urls and just doing this if the extension is .tar.gz. I'm wondering if we should introduce a new generator_archive_urls that is specifically this and deprecate the generator_urls attr.
@UebelAndre I feel like you might have an opinion here.
Okay, so I have changed to using the tar directly with download_and_extract. This is better. However, there is still a problem in that it only support
.tar.xzfiles because I am hijacking the generator_urls and just doing this if the extension is.tar.gz. I'm wondering if we should introduce a new generator_archive_urls that is specifically this and deprecate the generator_urls attr.@UebelAndre I feel like you might have an opinion here.
I think it would be fine to have generator_urls accept paths that were tar files and do some extraction based on the extension of the file. Though, instead of embedding the logic for this here, why not just use http_archive directly and pass the label of the binary in the extracted archive to generator? It seems like it'd be more useful to have an http_archive like rule that could do some host detection and conditionally download different artifacts.
http_host_archive(
name = "name",
urls = {
"x86_64-unknown-linux-gnu": "https://linux.com/archive.tar.gz",
"x86_64-darwin-apple": "https://apple.com/archive.tar.gz",
"x86_64-pc-windows-msvc": "https://windows.com/archive.tar.gz",
},
build_files = {
"x86_64-unknown-linux-gnu": "//3rdparty:BUILD.linux.bazel",
"x86_64-darwin-apple": "//3rdparty:BUILD.macos.bazel",
"x86_64-pc-windows-msvc": "//3rdparty:BUILD.windows.bazel",
},
sha256s = {
"x86_64-unknown-linux-gnu": "...",
"x86_64-darwin-apple": "...",
"x86_64-pc-windows-msvc": "...",
},
)
Then you could consume it like so:
crates_repository(
name = "crate_index",
generator = "@name//:cargo-bazel",
# ...
)
What are your thoughts there?
That seems needlessly complex. What are the build files, and why do I have to manually create them? What do they even look like?
Also, that looks like it would download all the binaries for all archs even if only one for the host arch is all that's needed. This is unacceptable overhead. I know that can be addressed with more complexity, but that seems like a lot for an initial setup of the rules_rust with crate support.
My preference would be to just add a new attr that is mutually exclusive with the current generator_url that allows the specification of an archive for each arch. If we do that, the resolution of the archive type would happen with the normal bazel logic, which would allow it to support all the types that bazel does (.zip, .tar.gz, .tar.xz, etc.). We can still use the same generator_sha256s attr for the shas. There is no need to specify that build file attr.
That seems needlessly complex. What are the build files, and why do I have to manually create them? What do they even look like?
That was just an example to show that a common attribute (like http_archive.build_file) could be applied to each variant. The contents or their existence isn't really relevant to the example otherwise.
Also, that looks like it would download all the binaries for all archs even if only one for the host arch is all that's needed. This is unacceptable overhead. I know that can be addressed with more complexity, but that seems like a lot for an initial setup of the rules_rust with crate support.
My expectation is that this rule would only download what's configured for the current host. I agree downloading all variants is unnecessary.
My preference would be to just add a new attr that is mutually exclusive with the current generator_url that allows the specification of an archive for each arch. If we do that, the resolution of the archive type would happen with the normal bazel logic, which would allow it to support all the types that bazel does (
.zip,.tar.gz,.tar.xz, etc.). We can still use the same generator_sha256s attr for the shas. There is no need to specify that build file attr.
The reason I lean away from adding the functionality to the rule itself is because I think it adds hidden complexity to the rule. If you were to add a new attribute then there'd need to be some contract for how these archives should be structured (should or shouldn't the archives contain a root level directory?).
I'm not opposed to the idea but I'm (perhaps to a fault) a fan of separation of concerns which is what's guiding my thoughts. I'd love to hear from @illicitonion and if they have thoughts here.
Okay on the build file thing. Thanks for explaining.
The reason I lean away from adding the functionality to the rule itself is because I think it adds hidden complexity to the rule. If you were to add a new attribute then there'd need to be some contract for how these archives should be structured (should or shouldn't the archives contain a root level directory?).
I initially leaned away for similar reasons, but there is no version of http_file the supports downloading a compressed file that is executable when uncompressed. I think that we should just eventually replace generator_urls with something that takes an archive so that the extension logic is handled by the download_and_extract.
I guess another option would be to have an attr that is the type much like the arg for download_and_extract. We could have a sentinel value (maybe "binary") to indicate that it's a file instead of an archive and plan on removing that option in the future and only accept archives.
FWIW, I don't like defaulting to the sentinel as that would make the type have to be manually specified when that wouldn't be necessary with the type arg in download_and_extract. Having said that, we could start with the next version defaulting to binary with the plan that the following release change the default to '' so that the autodetection and archives is the default. After that. I think we'd want to keep the type attr around for the same reasons it exists in the http_archive (repo rule) and download_and_extract (repo_ctx method) apis.
Yes, we would need a contract about how the archive looks, but I think that is a small price to pay for being able to save between 60-90% of the download size. As it stands right now, the archive has no root and is just a single file, the binary, in the name that is appropriate for the plateform (i.e. cargo-bazel.exe for win and cargo-bazel otherwise). We could probably use something like this to make it easier.
Longer term, I think that we should move the functionality provided by this binary to the build phase so that this binary doesn't have be provided to the repo rule, but i think that is another discussion entirely.
there is no version of http_file the supports downloading a compressed file that is executable when uncompressed
Maybe we should fix that, rather than trying to build new structures?
Adding .xz (as opposed to .tar.xz) support to this function which downloaded and extracted foo.xz to foo, and exposing that capability via http_file here, seem like pretty small and approachable changes?
Having to wait for a Bazel release to pull in the feature is for sure annoying, and we may want to patch in something temporary until that lands (which I imagine would look a lot like this PR), but I'd generally favour the composable and works-automatically-for-everyone solution where we can.
The .xz type is fundamentally different from those that are supported. It is not an archive format. It's only compression. The download_and_extract function is meant to download and extract and archive. It doesn't have a concept of "executable" like the download function because of that. You would probably want to add compression functionality to the download function instead of the download_and_extract function if you were going to add something to bazel.
Having said that. I do think there are advantages to using an archive format for this in case we ever wanted to add other files that are used by the binary.
Okay, so what is the way forward that would be acceptable? My recommendation is the adding the generator_url_type attr to the crates_repository, and using "binary" as the default with the intention of changing that default to "" in the future, thus making the archive the default.
If we do this, I think it would be useful to also create a pkg target to build the package for a given target arch.