bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Return values in module extensions

Open matts1 opened this issue 3 years ago • 3 comments

Description of the feature request:

Allow a module extension to return a value:

my_extension = use_extension(...)
value = my_extension.my_function(...)
# do something with value

What underlying problem are you trying to solve with this feature?

If a module extension returns a large number of git repositories, it becomes infeasible to use_repo on each one individually.

Here's an example:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps,
    "com_github_gogo_protobuf",
    ...
)

In this example, we need to specify the dependencies once in the go.mod file, and once again in  the MODULE.bazel file. I've tested, and the following does appear to work:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_deps.from_file(go_mod = "//:go.mod")
GO_REPOS = ["com_github_gogo_protobuf", ...]
use_repo(go_deps, *GO_REPOS)

Would it be possible to support the following:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_repos = go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps, *go_repos)

This would allow go.mod to be our source of truth for go repositories (and would allow other languages to do the same).

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.0.0-pre.20221012.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

N/A (using repo tool, many repos)

Have you found anything relevant by searching the web?

Python appears to take an alternative approach where they write every package to a single repository. This would be a theoretically possible alternative, but:

  • go has historically used one repo per package, and this would make it much harder to upgrade existing go repositories to bzlmod
  • I assume other languages may have similar issues
  • I tried writing a repo_rule for rust recently which did a similar thing, and it's much simpler to work with one package per repository
  • Depending on how this was implemented in bazel, there may be degraded performance for putting all things in a single repository, including potentially:
    • Unable to do a partial download if you only depend upon some of the things declared by the module extension
    • Unable to download these repos in parallel
    • A change to one package version could result in having to re-download every one.

Any other information, logs, or outputs that you want to share?

No response

matts1 avatar Dec 19 '22 04:12 matts1

@Wyverald I can't appear to assign the feature request to you, but since it's a module extension FR I assume you should be the one triaging?

matts1 avatar Dec 19 '22 04:12 matts1

Previous discussion on this topic: https://gist.github.com/fmeum/c87c183ad6ffc55add0ca04232761ea1

go_repos = go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps, *go_repos)

can't quite work since tags in MODULE.bazel are just metadata, they don't invoke any actual logic - module extensions are fetched lazily and in parallel, not while MODULE.bazel files are evaluated.

But as far as I understand the implementation, a tag could "return" an opaque object that use_repo accepts and resolves to all repos that the module extension marked as direct deps introduced by that particular tag via new Starlark API.

I am more and more convinced that we essentially get to choose between:

  1. Keep MODULE.bazel capabilities as is and accept that large monorepos will have to resort to tooling (e.g. Gazelle) to manage it. Note that MODULE.bazel can't be segmented, so it won't be possible to extract deps lists into macros as with WORKSPACE.
  2. Keep MODULE.bazel brief and human-editable, but at the cost of introducing new API and (likely rare in practice) cases of "repository has been defined twice: go_repos and other_go_repos" errors.

I would personally prefer 2): Module extensions already have access to all the API and deps information needed to manage external deps without additional, external tooling. Gazelle and other similar tools only operate on the syntax of MODULE.bazel and thus further limit the possible uses of Starlark in MODULE.bazel (variables, list comprehensions, ...).

fmeum avatar Dec 19 '22 09:12 fmeum

@matts1 The pattern we've been recommending is to use a "hub" repo that contains alias rules to other "satellite" repos that the module extension generates. Users would use_repo the hub repo only and still get the benefits of partial downloads, caches, etc.

This obviously doesn't work if the satellite repos have a lot of targets in them (which would mean that the hub contains the sum total of all targets in the satellite repos), which is unfortunately the case with rules_go. I don't have a good solution for this.


@fmeum

Keep MODULE.bazel brief and human-editable, but at the cost of introducing new API and (likely rare in practice) cases of "repository has been defined twice: go_repos and other_go_repos" errors.

The problem with that is we'd need to run all module extensions to resolve repo mappings for any given repo. Currently, the repo mapping of a non-module-extension-generated repo can be calculated from parsing MODULE.bazel files, without running any extensions. If we were to have something like use_all_repos, we'd need to run all extensions with use_all_repos eagerly, which is still okay except for the fact that we now allow extensions to use repos from other extensions. Essentially, we need to choose between use_all_repos, and only allowing extensions to see non-extension repos; otherwise, I don't see a way to structure the order of evaluations for extensions.

Gazelle and other similar tools only operate on the syntax of MODULE.bazel and thus further limit the possible uses of Starlark in MODULE.bazel (variables, list comprehensions, ...).

That may not be a bad thing -- we also recommend against extensive usage of Starlark in BUILD files.

Wyverald avatar Dec 19 '22 14:12 Wyverald

That's a very good point, I didn't think about the implications for extension evaluation.

I thought about alternative solutions and had an idea: How about having a Bazel module auto_use_repo that any module can use to automatically maintain the root module's use_repo calls? It would implement the MODULE.bazel-equivalent of Gazelle's update-repos, but 1) in a way that isn't Go-specific and 2) could use the existing module extension logic and thus be controlled entirely from Starlark.

@matts1 Do you think something like that could work for rules_rust?

fmeum avatar Dec 24 '22 11:12 fmeum

I created auto_use_repo. The README links to a Gazelle PR that adds support for it to go_deps.

fmeum avatar Jan 04 '23 10:01 fmeum

I don't think we need it for rules_rust - rules_rust should work just fine with the hub-and-spoke model that @Wyverald suggested, as it already is capable of generating spoke repos and a centralized build file that aliases to them, given the right configuration.

matts1 avatar Jan 05 '23 01:01 matts1

This is addressed by https://docs.google.com/document/d/1dj8SN5L6nwhNOufNqjBhYkk5f-BJI_FPYWKxlB3GAmA/edit. The API will be available in 6.2.0.

Wyverald avatar Apr 25 '23 18:04 Wyverald