bazel Provide hooks for host platform autodetection

Problem description

With https://github.com/bazelbuild/bazel/issues/7081 we will get a nice repository rule that will autodetect the host platform target. This target will be used as a default value of the --platforms and --host_platform Bazel options, and also as the default execution platform.

Currently the autodetection detects the cpu and os and selects the right constraint from https://github.com/bazelbuild/platforms. This means that any Bazel toolchain that uses at most these 2 settings will be selectable by this platform.

But that is not enough for toolchains that define custom constraint settings. For those toolchains to be selected we need to put their constraints into the host platform. And those toolchains will very probably need to perform some kind of host system inspection to properly tell which constraint value to add.

Alternatives

Platform inheritance

One solution is to tell users to create their own platform target that inherits from the autodetected host platform. User will be responsible for making sure to collect all the constraint_settings from all the toolchains in their build and make sure they are represented in the inherited platform. Then they have to direct --platforms and --host_platform into their inherited platform target.

With this approach very few projects will end up using the default value of --platforms and --host_platform. This is not great and goes against the grain of 'flagless builds` effort of the configurability team.

Hooks into host platform autodetection

This approach makes advantage of the observation that all these toolchains are already being registered in the WORKSPACE file and they potentially already perform the autodetection for themselves. What we need is to pass a list of labels from these custom repository rules to the @local_config_platform rule.

Explicit approach

User will have to collect generated constraints from all their rules (e.g. rules will write a please_put_these_constraints_into_host.bzl file in their repo, the user will load that bzl file in the WORKSPACE, and use a constant from there), call local_config_platform manually, and pass constraints as an argument to the call.

Advantages:
- no need to change default values of Bazel options
Disadvantages:
- need to collect transitive constraints from rules repos
- need to call local_config_platform manually

Implicit approach

We will figure out a way to allow rules to hook directly into local_config_platform. I have no idea how that would look like. Maybe the audience has ideas :)

Advantages:
- no need to change default values of Bazel options
- no need to call local_config_platform manually
- no need to collect constraints
Disadvantages:
- it's a kind of magic

Jul 02 '19 08:07 hlopko

CC @katre @dslomov @aehlig @laurentlb Wdyt? Do you have ideas for the Implicit approach for hooks?

Jul 02 '19 08:07 hlopko

Related: bazelbuild/rules_go#2089, bazelbuild/rules_cc#24, https://github.com/bazelbuild/bazel/issues/8763.

Jul 02 '19 08:07 hlopko

@aehlig kindly prototyped an example using native.existing_rules and annotating repositories that want to contribute to the local_config_platform using a special tag.

Roughly:

def _impl(ctx):
    loads = []
    concats = []
    for repo in ctx.attr.relevant:
      loads.append("load('@" + repo + "//:foo.bzl', " + repo + "_foo='FOO')")
      concats.append(repo + "_foo")
    ctx.file("relevant.bzl", "\n".join(loads) + "\nFOO=" + " + ".join(concats))
    ctx.file(
        "BUILD",
        "load(':relevant.bzl', 'FOO')\n" +
          "platform(name = 'host', constraint_values = FOO)"

real_local_config_platform = repository_rule(
    implementation = _impl,
    attrs = {
        "relevant": attr.string_list(),
    },
)

def local_config_platform(name):
    existing = native.existing_rules()
    relevant = [
        k
        for k, v in existing.items()
        if "tags" in v and "local_config_platform_hook" in v["tags"]
    ]
    real_local_config_platform(name = name, relevant = relevant)

Therefore unless somebody objects I'll go ahead and implement the implicit approach for Hooks into host platform autodetection.

Jul 03 '19 11:07 hlopko

I'd like to discuss how this might work with rbe_autoconfig, since we'll likely want something similar (and in general, for any other platform providers).

From what I understand, the Explicit approach should be easy to reuse in other rules and gives direct control to the users (I don't quite understand why please_put_these_constraints_into_host.bzl is needed, I assume a technical reason?)

I don't understand the code snippet for the Implicit approach. It seems like it's looking for a specific tag, but I don't know enough Starlark to understand it.

Where is the tag going to be set? Is this the per-target "tag" similar to "no-sandbox" or something different?

Jul 03 '19 14:07 agoulti

please_put_these_constraints_into_host.bzl is how we pass data from one repo to another. One repository writes a bzl file with a Starlark variable, following repository will load the bzl file and read the variable.

Implicit approach works by collecting all repositories with a specific tag, read a specific bzl file from each, and collect all the information in the final repository. E.g. cc_configure will be responsible for adding the tag to the repository it creates (cc_configure creates @local_config_cc repository - https://github.com/bazelbuild/bazel/blob/master/tools/cpp/cc_configure.bzl#L164. We'd just put , tags = ["tag_that_we_agreed_on"] and we're done.), and to write the bzl file that the @local_config_platform will load. @local_config_platform will iterate over all existing repositories, take those with the tag, load their bzl files, and process the data.

Jul 04 '19 12:07 hlopko

@aehlig @dslomov, your take? (I know @aehlig was slightly pro, and dslomov@ slightly against :) I'll also wait for @katre's opinion.

Jul 05 '19 14:07 hlopko

@aehlig @dslomov, your take?

As you already know, I think collecting "all repositories of a kind" via native.existing_rules() is fine. What I'm worried about is the use of random tags; it would be nice, if there was a simple rule for the user to know which tags have a special meaning to bazel and which are free for their own use.

Jul 05 '19 14:07 aehlig

What would be a way to register toolchains whose constraints would not get automatically included in the host platform (for use in the remote platforms)?

Currently there is a way to reuse the auto-detection code by running Bazel inside a Docker container (see docker_toolchain_autoconfig).

If the produced toolchains indiscriminately auto-registered themselves on the host, this method would be broken.

Jul 05 '19 15:07 agoulti

This needs to be re-written as a design proposal and discussed in that fashion. A github issue thread isn't very discoverable for others who are interested in this topic.

This can be used as a tracking bug for the implementation after the design is agreed upon.

Jul 08 '19 13:07 katre

I am not actively working on this, so unassigning in case someone else wants to take a shot.

May 11 '20 12:05 katre

It seems to me that unless there's an easy way to hook into the platform autodetection, the whole idea of platforms, constrains, and toolchains becomes very hard to use, and inevitably not flag-less.

Jun 03 '22 21:06 burdiyan

The explicit approach @hlopko describes in the first comment mentions this:

need to call local_config_platform manually

It seems like there's some function local_config_platform that could be called somewhere to provide additional constrains that might have been detection by a custom repository rule. But I couldn't find any references to this. The only thing I know about is the @local_config_platform external repository that is defines default constraints for the auto-detected platforms.

Jun 03 '22 22:06 burdiyan

There is no function currently that allows this: I believe @hlopko was suggesting a possible way to implement autodetection by adding such a function.

I agree that this would be a useful feature, but unfortunately no one has had the time to write a design proposal and get agreement on what the best mechanism would be. We're keeping this open to track that it's something we want to come back to, but it isn't, unfortunately, something we can prioritize now.

I'm definitely open to accepting design proposals and implementations from the community, if anyone wants to take a look into the problem, but I am aware that the design is definitely the hard part here.

Jun 06 '22 11:06 katre

@Wyverald If module extensions could depend on repositories defined in other module extensions, I think that this could be used to solve this issue in a very natural way:

Somewhere in @bazel_tools, offer a module extension with a register_host_constraints tag class taking a label to a .bzl file. The .bzl file is expected to export an ADDITIONAL_HOST_CONSTRAINTS list with constraint value labels.
Every ruleset generates such a .bzl file in a repository rule and passes the label to register_host_constraints in its MODULE.bazel file.
A starlarkified local_config_platform module extension loops over all tags and generates a .bzl file that loads all constraints from the individual .bzl files and exports them as the combined HOST_CONSTRAINTS.

This is pretty much @hlopko's idea from https://github.com/bazelbuild/bazel/issues/8766#issuecomment-508463694 expressed in Bzlmod terms. It looks like this could be done entirely in Starlark.

Jul 26 '22 11:07 fmeum

This actually works today without any changes to Bazel (although it is affected by https://github.com/bazelbuild/bazel/issues/15916 if the main module defines constraints).

As a demo, I created https://github.com/fmeum/local_config_platform, which contains a starlarkified local_config_platform Bazel module offering a host_platform module extension with a tag add_constraints that modules can use to specify a .bzl file with additional constraints.

https://github.com/fmeum/local_config_platform/tree/main/tests/bcr contains an example module that depends on a module foo which adds a synthetic constraint here.

@katre Does this look reasonable? I'm open to providing a PR and/or a design doc. This could even be maintained outside the Bazel core as an independent Bazel module - although that would require users to point --platforms to the platform defined by this module.

Aug 07 '22 20:08 fmeum

@fmeum This is very cool, and is basically what I was thinking of. I'd like to read a design doc (hopefully with some pointers to the bzlmod docs, because I am not full up to date on that), but this is definitely looking interesting.

Aug 08 '22 14:08 katre

FYI we discussed in the SIG meeting that the group is fine funding Fabian's time to write that design doc.

Dec 13 '22 18:12 alexeagle

This is very exciting, I'm looking forward to what comes out.

Dec 20 '22 16:12 katre

@fmeum How would this example work if you wanted the inspect the host to decide which platform constraints to add?

Mar 21 '23 17:03 cameron-martin

@cameron-martin You can pass the label of a generated constraints.bzl file to the module extension. The local_config_platform module contains some helper functions for this.

Mar 21 '23 18:03 fmeum

I see this now. Looks great 👍

Mar 21 '23 20:03 cameron-martin

@fmeum What's the progress on the design document? I think being able to add constraints, such as target_vendor from apple_support, to the host platform would be very valuable.

Mar 05 '24 21:03 brentleyjones

@brentleyjones @cgrindel and everyone else with a use case, could you briefly describe what you would use this feature for in a comment?

I can finally start working on the doc in April.

Mar 06 '24 15:03 fmeum

The main thing I want to solve is the non-deterministic toolchain resolution surrounding the apple_support cc_toolchain. We added the target_vendor constraint to the apple_support platforms, but we had to give it a default value to allow it to match the auto-generated host platform. Ideally we could have that host platform have a value set for that constraint if there is a dependency on apple_support and the OS is in fact macOS.

Mar 06 '24 15:03 brentleyjones

@fmeum Our company creates hardware and some tests require that hardware so have an execution constraint specifying so, mainly for the purpose of remote execution. However for local execution I would like to modify the host platform to add this constraint if the host has this hardware.

Mar 06 '24 15:03 cameron-martin

We have a client that has targets which rely on a GPU. However, not all of the developer machines have the GPU. We are adding a custom constraint to thetarget_compatible_with for those targets. Ideally, we would be able to detect whether the GPU is present on the host and add a constraint to the platform defined in @local_config_platform.

Mar 06 '24 16:03 cgrindel

All interested parties: please see design doc https://docs.google.com/document/d/1g5JAAOfLsvQKBGqzSLFp1hIYFoQsgOslsjaIGV6P7Tk/edit

Mar 19 '24 01:03 Wyverald