bazel-skylib Add apparent_repo_name utility to modules.bzl

Adds the apparent_repo_name macro to derive the apparent name from repository_ctx.name when running under Bzlmod. This enables repository rules to generate their own top level default target without requiring a redundant parameter.

Originally developed while updating rules_scala to support Bzlmod as part of bazelbuild/rules_scala#1482.

For examples of its use, see bazelbuild/rules_scala#1621.

BTW, I happened to put it in the modules lib, as that seemed like the best fit at the time. Happy to move it elsewhere.

cc: @BillyAutrey @jayconrod @benjaminp @TheGrizzlyDev

Oct 14 '24 21:10 mbland

Thanks for sending such a well-executed PR (and migrating rules_scala)!

Since Bzlmod introduces a number of new concepts, I want to make sure that they are applied and named correctly as well as avoid XY problems. So please bear with me being nitpicky and delving into the individual proposed helpers below.

`adjust_main_repo_prefix`

In rules_scala, this is used to turn label strings provided by users into label string prefixes that are then checked for to determine targets that are included or excluded from certain functionality (say strict deps). Matching labels as strings is usually pretty error-prone and I think that rules_scala may suffer from some issues here caused by this:

//foo:foo would also match //foo:foo1, //foo would match //foo1.
@some_other_module// wouldn't match as expected as the label of targets in this module would start with the canonical repository name, not the apparent repository name seen from the main module.

It's generally preferable to accept target patterns as attr.label_list or convert to Label as early as possible (using native.package_relative_label in a macro) and only ever pass Label instances around in code. Depending on your use case, you could also have users supply the label of a package_group and then use PackageSpecificationInfo#contains. cc_shared_library in Bazel and go_sdk.nogo provide different examples of this.

In any of these cases, manually adjusting the main repo prefix wouldn't be needed. Happy to help rules_scala migrate to another mechanism and thus add another example to the list!

`is_bzlmod_enabled`

This already exists in bazel_features and is arguably better placed in a library dedicated to feature detection that becomes less relevant over time.

`repo_name`

Label#workspace_name is still available at HEAD, so a project that needs to support Bazel 5 should probably just keep using it for now. The flag will probably be flipped for Bazel 9, at which point Bazel 5 will no longer be officially supported and projects could switch to the new method name at that point.

If a project wants to adopt it early for some reason, I would personally consider it more descriptive to keep the simple logic close to the code that uses it. This prevents a casual reader from assuming that the helper does more than it actually does. In fact, if made type-safe, it could even be simplified down to just getattr(label, "repo_name" if hasattr(label, "repo_name") else "workspace_name").

`apparent_repo_name` and `apparent_repo_label_string`

My main problem with this function is that a repository doesn't have a single "apparent name". Rather, it has one apparent name from the point of view of each other repository that has visibility into it. Since this is one of the key concepts that makes Bzlmod tick (by avoiding the need for well-known names), we need to be careful with naming here.

From what I can tell, this function returns:

the name of a module when called on a label pointing to a target in that module's repo
the name of a repo as given by a module extension if the label points to a target in a module extension repo
the name of a WORKSPACE repo if the label points to a target in a WORKSPACE repo
the empty string if the label points to a target in the main repository

In all but the third case, which is going away with Bazel 9, it doesn't return an apparent name, so we would definitely need to rename or break this function up into multiple functions with smaller scope.

Could you briefly describe the use cases you encountered while migrating rules_scala that led to this "catch-all" function? I'm happy to suggest alternatives based on that and we can look into where to document them so that they are more easily discovered in the future.

CC @Wyverald @meteorcloudy

Oct 15 '24 08:10 fmeum

No need to bear with you—this is exactly the kind of thoughtful and helpful response I was fishing for! 😉 Thanks for responding so quickly as well.

As noted below, I've already withdrawn most of this PR, with the exception of apparent_repo_name. But now I'm going to try a couple of ideas to see if I can get away without it as well, and will ping again once I have.

BTW, my last two commits removed all but apparent_repo_name, but broke the "Last Green Bazel" builds. I'm certain the failure has nothing to do with this PR:

(15:21:56) ERROR: Traceback (most recent call last):
	File "/workdir/docs/BUILD", line 182, column 12, in <toplevel>
		update_docs(
	File "/workdir/docs/private/stardoc_with_diff_test.bzl", line 107, column 11, in update_docs
		native.sh_binary(
Error: no native function or rule 'sh_binary'

`adjust_main_repo_prefix`: Withdrawn

I agree, the target filtering mechanism as currently implemented in rules_scala is a bit wonky for the reasons you cite. I'll run it by the current maintainers to see if they've the appetite to change it while I'm doing my Bzlmod conversion; it would be a potentially breaking change from WORKSPACE, so there'd be a major version bump. (I may have another reason to recommend a major bump, after toolchainizing its dependencies.) If they don't want me to do it now, I can file a bug to update it in the next major release.

I've withdrawn this one, and will only keep it in rules_scala if the maintainers don't want to update it right away.

`is_bzlmod_enabled`: Withdrawn

Now that you mention it, I saw the @bazel_features implementation at some point but forgot about it. I only hoisted it out for the sake of main_repo_prefix, used by adjust_main_repo_prefix.

Withdrew this as well.

`repo_name`: Withdrawn

Heh, this wasn't part of my original commit; I'd only added it when the bazel-skylib CI broke on the Bazel 5 build. Withdrawn and using the recommended one liner in its place. (Though I did extract a private _repo_attr variable to hold the result of the hasattr expression.)

`apparent_repo_label_string`: Withdrawn

Withdrawing this, too, as I was using it on the other end of the filtering operation that applied adjust_main_repo_prefix. If we filter using proper Label() values instead of strings, the problem apparent_repo_label_string solves goes away.

`apparent_repo_name`

All those return value cases are correct. In practice, I'm solving primarily for case 2 (with case 3 for pre-Bzlmod compatibility). However, I tried to ensure cases 1 and 4 were covered to avoid potential bugs.

I was surprised at first that these wouldn't be considered apparent names, but I see the point regarding repo mapping.

The use cases break down to two particular problems. See the message for commit https://github.com/bazelbuild/rules_scala/pull/1621/commits/25e8aa8b1a2ac916b923980fdad23e4f5ca5a965 from https://github.com/bazelbuild/rules_scala/pull/1621 for more info.

But now that I'm thinking about it more, I've a couple ideas in mind that might help me avoid the need even for apparent_repo_name, noted below. I'll ping again once I've given them a try.

Repository rule generated `BUILD` target names

I'm wondering if I can use Label.name for this class of problems instead.

The jvm_import_external repository rule emits default BUILD targets for JAR files, so dependencies can depend on these repos using only the canonical repo name, like @io_bazel_rules_scala_compiler.

(Aside: I don't know enough to know why rules_jvm_external wouldn't work for this repo. All I know is that the repo JARs can be versioned for specific Scala versions, which can also appear in the apparent repo name; I'm using a non-versioned default repo name here. Also, different toolchain-related repos are included for different user-configurable Scala versions.)

Without using apparent_repo_name, the default target names would look like this under Bzlmod:

scala_import(
    name = "_main~scala_deps~io_bazel_rules_scala_scala_compiler",
    jars = ["scala-compiler-2.12.18.jar"],
)

Then a (macro generated) target listing @io_bazel_rules_scala_compiler in its dependencies produced errors like:

ERROR: .../_main~_repo_rules~io_bazel_rules_scala/scala/BUILD:
  no such target '@@_main~scala_deps~io_bazel_rules_scala_scala_compiler//:io_bazel_rules_scala_scala_compiler':
  target 'io_bazel_rules_scala_scala_compiler' not declared in package ''
  defined by .../_main~scala_deps~io_bazel_rules_scala_scala_compiler/BUILD
  and referenced by '@@_main~_repo_rules~io_bazel_rules_scala//scala:default_toolchain_scala_compile_classpath_provider'

You can see in https://github.com/bazelbuild/rules_scala/pull/1621 that I used apparent_repo_name to solve a handful of other similar problems when generating BUILD targets.

Attaching repository assets as embedded resources

I'm wondering if I should just use the common resource_strip_prefix attribute of affected rules, or possibly use the Java runfiles library (or contribute a Scala version).

Under Bzlmod, accessing resources from external repos, like in the ScalaLibResourcesFromExternalDepTest.scala file from //test/src/main/scala/scalarules/test/resources:TestScalaLibResourcesFromExternalDep, would break with:

$ bazel test //test/src/main/scala/scalarules/test/resources:all

1) Scala library depending on resources from external resource-only
  jar::allow to load resources(scalarules.test.resources.ScalaLibResourcesFromExternalDepTest)
  java.lang.NullPointerException
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest.get(ScalaLibResourcesFromExternalDepTest.scala:17)
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest.$anonfun$new$3(ScalaLibResourcesFromExternalDepTest.scala:11)
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest.$anonfun$new$2(ScalaLibResourcesFromExternalDepTest.scala:11)

The BUILD target is:

scala_specs2_junit_test(
    name = "TestScalaLibResourcesFromExternalDep",
    size = "small",
    srcs = ["ScalaLibResourcesFromExternalDepTest.scala"],
    resources = ["@test_new_local_repo//:data"],
    suffixes = ["Test"],
)

@test_new_local_repo is defined under a module extension as:

    new_local_repository(
        name = "test_new_local_repo",
        build_file_content = """
filegroup(
    name = "data",
    srcs = glob(["**/*.txt"]),
    visibility = ["//visibility:public"],
)
""",
        path = "third_party/test/new_local_repo",
    )

The source file attempts to access a file in this repo as:

get("/external/test_new_local_repo/resource.txt")

So I updated _target_path from scala/private/resources.bzl to use a new _update_external_target_path function, so the Scala code could use the apparent repo name in the /external/ path:

def _update_external_target_path(target_path):
    if not target_path.startswith("external/"):
        return target_path
    prefix, repo_name, rest = target_path.split("/")
    return "/".join([prefix, apparent_repo_name(repo_name), rest])

Oct 15 '24 18:10 mbland

The best way to get a pretty name into a repo rule is to just pass it in via an attr.string (could be optional for backwards compatibility with WORKSPACE and fall back to ctx.attr.name).

Oct 15 '24 21:10 fmeum

OK, some significant updates. I've updated this PR to now only contain def apparent_repo_name(repository_ctx), specifically to allow repo rules to inject their own apparent repo name into a default BUILD target. As suggested, we could add a redundant attr.string to each repo rule that needs to do this, but that seems like a suboptimal user experience.

That said, I've updated bazelbuild/rules_scala#1621 to solve all the other problems without apparent_repo_name. I appreciate the thoughtful feedback that inspired me to find better solutions to those problems!

FWIW, I don't know if something like _expand_patterns would be useful to contribute in any way. I suppose asking people to write rules with separate attr.label_lists for includes and excludes would be preferable, but my goal here was to preserve existing behavior in rules_scala.

Oct 16 '24 17:10 mbland

Closing this now, as my subconscious concocted another approach to the "default repo target name" problem last Friday. Basically, you can use a macro to wrap the actual repository_rule and duplicate the name argument for the additional parameter. I've opened bazelbuild/rules_scala#1650 to replace apparent_repo_name with this macro pattern (note this is for a completely internal repo rule, but the pattern applies to public ones, too):

_alias_repository = repository_rule(
    implementation = _alias_repository_impl,
    attrs = {
        "default_target_name": attr.string(mandatory = True),
        "target": attr.string(mandatory = True),
    },
)

def _alias_repository_wrapper(**kwargs):
    """Wraps `_alias_repository` to pass `name` as `default_target_name`."""
    default_target_name = kwargs.pop("default_target_name", kwargs.get("name"))
    _alias_repository(default_target_name = default_target_name, **kwargs)

From that PR: One small downside of this technique is that the macro can't be imported into a MODULE.bazel file directly via use_repo_rule. If you did want to use it in MODULE.bazel, you'd have to write a module extension to call it, or use modules.as_extension from bazel_skylib. I suppose that's a small price to pay for squashing a canonical repo name format dependency forever.

The other small downside may be that the documentation from the original repository_rule doesn't automatically convey to the repo wrapper.

Nov 18 '24 15:11 mbland

Add apparent_repo_name utility to modules.bzl

adjust_main_repo_prefix

is_bzlmod_enabled

repo_name

apparent_repo_name and apparent_repo_label_string

adjust_main_repo_prefix: Withdrawn

is_bzlmod_enabled: Withdrawn

repo_name: Withdrawn

apparent_repo_label_string: Withdrawn

apparent_repo_name

Repository rule generated BUILD target names

Attaching repository assets as embedded resources

`adjust_main_repo_prefix`

`is_bzlmod_enabled`

`repo_name`

`apparent_repo_name` and `apparent_repo_label_string`

`adjust_main_repo_prefix`: Withdrawn

`is_bzlmod_enabled`: Withdrawn

`repo_name`: Withdrawn

`apparent_repo_label_string`: Withdrawn

`apparent_repo_name`

Repository rule generated `BUILD` target names