Support a default hub repo concept in bzlmod
🚀 feature request
Relevant Rules
pip.parse
Description
Currently the pip.parse needs the hub_name parameter, which in reality is quite often set to pypi or pip in the root module. In the non-root modules the users should be also able to use pip or pypi names and this could be the default hub_name value.
Describe the solution you'd like
Minimum (for 1.0.0):
- Set the
hub_namedefault value topypi(it should not bepipbecause we are not usingpiphere andpypiis a more generic term). - Support isolation of
pip.parseextension usage via theisolateparameter. - We could force users to use the
isolateparameter if thehub_nameis not set, or if it is set topiporpypivalues
Extra (post 1.0.0):
- Support passing
pyproject.tomland/or constraint files topip.parseinstead of the lock file itself. Enable users to lock the dependency tree via@pypi//:locktarget to generate a lock file in the users workspace. An alternative would be to merge the requirements in some way, but that could be harder. This greatly depends on #1975 becauseuvhas a lot of different flags to allow overriding dependencies in the constraint files and may enable us to have a working solution. Merging the lock file by just blindly using the lowest version of all could work, but may not be reliable and the solution to properly do the locking may be required.
The lock file merging is used in the go ecosystem and requiring users to either name the hub repo that collects the spoke repos or using a default name is the convention of rules_go/gazelle, see bazelbuild/bazel-gazelle#1584. Also see bazelbuild/bazel#20186, which talks about stabilizing extension isolation feature. Having rules_python supporting it may be a good way to give feedback to bazel devs.
See #2088 for a slightly related topic.
Describe alternatives you've considered
The main alternative would be to not support it and keep things as they are, but that could make it harder to implement the isolation later down the line. What is more, supporting isolation of the extensions could make the pip code simpler, because we could just ask the user to use isolate if we see clashes.
- Set the
hub_namedefault value topypi(it should not bepipbecause we are not usingpiphere andpypiis a more generic term).
+1 For pypi and not pip
- Support passing
pyproject.tomland/or constraint files topip.parseinstead of the lock file itself. Enable users to lock the dependency tree via@pypi//:locktarget to generate a lock file in the users workspace.
This is one additional reason why I've been keen to move dependency specifier lists into Starlark.
I'm generally +1 on this idea.
However, I don't see how this can reliably interact with lock files, i.e., how specific versions of a transitive closures can be merged and respected and work reliably. The issue is the inclusion of arbitrary requirements from elsewhere in the module graph.
use isolate=True if there are clashes
I don't see how this would scale? Consider a triangle dependency situation:
- X depends on A and B, directly or indirectly,
- A requires foo >= 2.0
- B requires foo <= 1.0
A and B work fine on their own. Its only when they both end up in a transitive module graph structure that issues occur.
- Who is supposed to set isolate=True in this case? I don't think there's a good answer to this question.
- How can the root module fix things? Can root can override a submodule to use isolate=True? Or cause a submodule to remap '@pypi' to some alternative hub_name without the clash?
Yes, I should clarify as well. I'm only firmly +1 on the hub_name defaults.
For the rest, I'm -1 Im not convinced on any lock file merging for similar reasons as @rickeylev. I don't believe that its generally solvable, which will lead to more issues than it solves, and more maintenance burden.
I can be convinced of a solution where we let the root module re-lock against direct dependencies from transitive repositories, but even then, Im a little confused by the problems this is believed to solve. If somebody needs to compose code and dependencies at a python language level (not a bazel extension level), then I think the most standard path forward would be to publish packages (to PyPI or other) or to use git submodules or similar code sharing mechanisms to merge source code repositories.
I don't think languages (outside golang), can safely compose x_library rules across repositories in the presence of third-party dependencies. For example, in Java, you'd share a jar by publishing it to Maven Central. For rust, you'd share a crate by publishin it to crates.io. Similarly, for Python, you'd share a sdist/wheel by publishing it to PyPI. Private artifact registries are also possible for all these languages as well.
I can imagine an API like:
pip.package(name = "foo")
And this basically acts like:
# requirements.in
foo
# requirements.txt
foo == blabla --hash blabla
dep-of-foo == etc etc
# module.bazel
pip.parse(
hub_name = "pypi",
requirements_lock = "requirements.txt"
)
And I guess it would run requirements.in -> requirements.txt conversion during the bzlmod extension phase. This also necessitates that the resulting requirements.txt isn't implicitly platform specific.
Adding version constraints to the pip.package() call seems problematic, though. It seems more like a "best effort, but no guarantee" type of thing.
Yes, rules_rust has something similar for crates in cargo. You can load crates from a cargo lockfile, but you can also load them from specs in Starlark: https://github.com/bazelbuild/rules_rust/blob/b5ecaea4b5996912693135ea14447d22271e9cd5/examples/bzlmod/proto/MODULE.bazel#L77
crate = use_extension("@rules_rust//crate_universe:extension.bzl", "crate")
crate.spec(
package = "prost",
version = "0.12",
)
crate.spec(
package = "prost-types",
version = "0.12",
)
...
crate.from_specs()
use_repo(crate, "crates")
There's also a variation that looks like this:
crates_repository(
name = "crate_index",
cargo_lockfile = "//:Cargo.lock",
lockfile = "//:Cargo.Bazel.lock",
packages = {
"async-trait": crate.spec(
version = "0.1.51",
),
"mockall": crate.spec(
version = "0.10.2",
),
"tokio": crate.spec(
version = "1.12.0",
),
},
)
And the following to relock CARGO_BAZEL_REPIN=1 bazel sync --only=crate_index
I'd think it would be possible to do something similar in python. I think they have things a bit easier because I don't think they need to deal with an sdist vs wheel situation. But yeah, the crate.from_specs() call would be solving dependencies behind the scenes and failing if there is no solution.
In https://github.com/bazel-contrib/rules_python/pull/3248, we added the ability to make the rules transition on arbitrary, even custom user defined, flags. One of the motivations for this was to make a "hub of hubs" easier. This is when multiple pip.parse hubs are put behind a single set of targets with select() used to pick which are used. This idea is, essentially the same as having a single @pypi repo that merges all the particular ones.
Right now, this is possible using flags and select(). If we want to fold that directly into our rules, then we need:
- A flag for py_binary et all to transition on.
- A way to specify a pip.parse hub as a default.
Some open questions for this:
- Name of flag?
pypi,pypi_hub,pypi_hub_name; I don't really likepypiin the name since this is, essentially, some arbitrary extra string used to help force dependency resolution a particular way. It's not strictly tied to pypi or pip.parse. - String flag or label flag? Labels have identity uniqueness and can't be typoed. Strings are a more direct mapping to a hub name.