`rules_python_publish_deps` causing pip extension resolution to always require network access
🐞 bug report
Affected Rule
https://github.com/bazel-contrib/rules_python/blob/b40d96aba36d675c60b03424aa22f31c09e0ea4f/MODULE.bazel#L63-L75
Is this a regression?
Yes, after https://github.com/bazel-contrib/rules_python/commit/b9b0948234216cf43bb898eec078b1c71a37c1f9, moving from pip_internal to pip.
Description
The rules_python_publish_deps python hub uses the same module extension (pip) end users use, which causes the module extension resolution to always require network access. In offline build without an up-to-date lockfile, the module extension resolution will fail even though rules_python_publish_deps is never required by the end user.
This was discovered in https://github.com/bazelbuild/bazel/pull/26163, where upgrading rules_python broke Bazel's bootstrap build.
🔬 Minimal Reproduction
🔥 Exception or Error
[1A[K[31m[1mERROR: [0m/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/simpleapi_download.bzl:130:14: Traceback (most recent call last):
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 615, column 25, in _pip_impl
mods = parse_modules(module_ctx)
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 490, column 36, in parse_modules
out = _create_whl_repos(
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 155, column 50, in _create_whl_repos
requirements_by_platform = parse_requirements(
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/parse_requirements.bzl", line 180, column 36, in parse_requirements
index_urls = get_index_urls(
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 470, column 79, in lambda
get_index_urls = lambda ctx, distributions: simpleapi_download(
File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/simpleapi_download.bzl", line 130, column 14, in simpleapi_download
_fail("Failed to download metadata for {} for from urls: {}".format(
Error in fail: Failed to download metadata for ["nh3", "idna", "rich", "zipp", "mdurl", "twine", "certifi", "keyring", "pkginfo", "rfc3986", "urllib3", "docutils", "pygments", "requests", "jaraco-classes", "jaraco-context", "markdown-it-py", "more-itertools", "readme-renderer", "jaraco-functools", "backports-tarfile", "requests-toolbelt", "charset-normalizer", "importlib-metadata", "cffi", "jeepney", "pycparser", "cryptography", "secretstorage", "pywin32-ctypes"] for from urls: ["https://pypi.org/simple"]
🌍 Your Environment
Operating System:
macOS, Linux, Windows
Output of bazel version:
8.2.1
Rules_python version:
1.3.1
Anything else relevant?
This might be working as intended, but should rules_python_publish_deps be dev_dependency?
This is unfortunately working as intended since the publishing dep is something that the user can consume and it is not a dev_dependency. We could use the isolation of the extension, but I think that would require all of the consumers of rules_python to modify their .bazelrc.
Other solutions for this would be:
- Use a PEP751 compliant lock file #2787. This is not yet supported by
pip.parsebut this would fix the issue. - Remove the usage of
experimental_index_urlin that dependency - it's fine for now - this should remove the network dependency.
However, in general I don't see how users could get around this unless extension isolation or have an alias extension for internal Python usage. Any thoughts on this topic?
Thanks for the quick response! Yes, that's also my understanding. This isn't a big issue for Bazel, we've workaround this for our bootstrap build (we don't really need the pip dependencies). And I agree using extension isolation is probably the solution, but we have to make that feature stable first. @Wyverald
Also using the lock file can partially mitigate the issue since it skips the resolution.
In 1.4 the extension has been marked as reproducible, so the lock file will not help. See #2434. So for now just dropping the experimental_index_url usage is good enough for now.
PRs dropping that would be welcome.
It is unfortunate that this extra network activity is necessary, even if twine isn't used.
Couple other ideas
Is it possible to have a file-based simple api URL?
Ignore the network failure, just generate some stubs?
It does seem problematic that, if a bzlmod extension does any network activity, it can't work in offline mode
It does seem problematic that, if a bzlmod extension does any network activity, it can't work in offline mode
One major use case of module extension is to call package manger to resolve dependencies, requiring network for that does make sense to me. The only fix is using the lock file to bypass the resolution, which means for reproducible module extension that requires network access, checking in the lockfile still might be useful. We are generating lockfile for reproducible extensions in https://github.com/bazelbuild/bazel/pull/24757, but not under a place that can be checked in. @fmeum
Would https://github.com/bazelbuild/bazel/issues/24777 here? It would definitely help rules_go in a similar case (getting SDK URLs and hashes from just a version identifier).
I think it would help with incremental build but if you just clone a repo containing all vendored repos, you can still not build offline due to the module extension resolution?
With an up-to-date lockfile checked in you would be since the reproducible data fetched over the network would be persisted in the facts, which are stored in the regular (not the hidden) lockfile.
Does it mean that rules_python lock file would affect the root module?
Does it mean that rules_python lock file would affect the root module?
It would result in information being added to the MODULE.bazel.lock file of any module transitively using rules_python. Hopefully that information could be very concise though and, at some point, we might want to suggest users update their lockfiles with a command that doesn't blindly run all extensions, even the ones they never use in their build.
This is unrelated to the ticket, but in the past it has been problematic to have MODULE.bazel.lock and test against multiple bazel versions using --lockmode=error (because bazel version affects the lock file), I hope whatever clever solution we come up with here does not have the same issue.
Probably broader than just this issue, but this revealed issues with rules_python and Artifactory mirrors for me. Artifactory doesn't seem to mirror the html index (looking at the python docs this seems to be acceptable?)
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/nh3/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/idna/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/rich/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/mdurl/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/keyring/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/zipp/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/twine/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/docutils/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/certifi/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pkginfo/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/urllib3/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pygments/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/rfc3986/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/requests/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/markdown-it-py/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/backports-tarfile/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-classes/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/requests-toolbelt/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-context/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/readme-renderer/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/more-itertools/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jeepney/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pycparser/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-functools/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/secretstorage/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pywin32-ctypes/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/cryptography/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/charset-normalizer/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/cffi/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/importlib-metadata/ failed: class java.io.FileNotFoundException GET returned 404
At dayjob Artifactory works fine with mirroring this. I think there may be an issue somewhere here - maybe our artifactories are configured differently?
We realized that #3034 is a duplicate of this issue, so bringing over some details from that thread. In that issue with the reproduction example provided, I was able to find that this is not related to the use of experimental_index_url. It also uncovered that when the initial fetch of these extra dependencies fails, a bazel shutdown is required to recover things. The easiest way to reproduce is to clear the bazel cache, disable internet access, run the command to see the failed build, restore internet access, and rerun the command to see the persisted failure.