rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

`rules_python_publish_deps` causing pip extension resolution to always require network access

Open meteorcloudy opened this issue 6 months ago • 12 comments

🐞 bug report

Affected Rule

https://github.com/bazel-contrib/rules_python/blob/b40d96aba36d675c60b03424aa22f31c09e0ea4f/MODULE.bazel#L63-L75

Is this a regression?

Yes, after https://github.com/bazel-contrib/rules_python/commit/b9b0948234216cf43bb898eec078b1c71a37c1f9, moving from pip_internal to pip.

Description

The rules_python_publish_deps python hub uses the same module extension (pip) end users use, which causes the module extension resolution to always require network access. In offline build without an up-to-date lockfile, the module extension resolution will fail even though rules_python_publish_deps is never required by the end user.

This was discovered in https://github.com/bazelbuild/bazel/pull/26163, where upgrading rules_python broke Bazel's bootstrap build.

🔬 Minimal Reproduction

🔥 Exception or Error




[1A[K[31m[1mERROR: [0m/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/simpleapi_download.bzl:130:14: Traceback (most recent call last):
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 615, column 25, in _pip_impl
		mods = parse_modules(module_ctx)
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 490, column 36, in parse_modules
		out = _create_whl_repos(
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 155, column 50, in _create_whl_repos
		requirements_by_platform = parse_requirements(
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/parse_requirements.bzl", line 180, column 36, in parse_requirements
		index_urls = get_index_urls(
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/extension.bzl", line 470, column 79, in lambda
		get_index_urls = lambda ctx, distributions: simpleapi_download(
	File "/tmp/bazel_8RGzY3p4/out/external/rules_python+/python/private/pypi/simpleapi_download.bzl", line 130, column 14, in simpleapi_download
		_fail("Failed to download metadata for {} for from urls: {}".format(
Error in fail: Failed to download metadata for ["nh3", "idna", "rich", "zipp", "mdurl", "twine", "certifi", "keyring", "pkginfo", "rfc3986", "urllib3", "docutils", "pygments", "requests", "jaraco-classes", "jaraco-context", "markdown-it-py", "more-itertools", "readme-renderer", "jaraco-functools", "backports-tarfile", "requests-toolbelt", "charset-normalizer", "importlib-metadata", "cffi", "jeepney", "pycparser", "cryptography", "secretstorage", "pywin32-ctypes"] for from urls: ["https://pypi.org/simple"]

🌍 Your Environment

Operating System:

  
macOS, Linux, Windows
  

Output of bazel version:

  
8.2.1
  

Rules_python version:

  
1.3.1
  

Anything else relevant?

This might be working as intended, but should rules_python_publish_deps be dev_dependency?

meteorcloudy avatar May 28 '25 10:05 meteorcloudy

This is unfortunately working as intended since the publishing dep is something that the user can consume and it is not a dev_dependency. We could use the isolation of the extension, but I think that would require all of the consumers of rules_python to modify their .bazelrc.

Other solutions for this would be:

  • Use a PEP751 compliant lock file #2787. This is not yet supported by pip.parse but this would fix the issue.
  • Remove the usage of experimental_index_url in that dependency - it's fine for now - this should remove the network dependency.

However, in general I don't see how users could get around this unless extension isolation or have an alias extension for internal Python usage. Any thoughts on this topic?

aignas avatar May 28 '25 14:05 aignas

Thanks for the quick response! Yes, that's also my understanding. This isn't a big issue for Bazel, we've workaround this for our bootstrap build (we don't really need the pip dependencies). And I agree using extension isolation is probably the solution, but we have to make that feature stable first. @Wyverald

meteorcloudy avatar May 28 '25 14:05 meteorcloudy

Also using the lock file can partially mitigate the issue since it skips the resolution.

meteorcloudy avatar May 28 '25 14:05 meteorcloudy

In 1.4 the extension has been marked as reproducible, so the lock file will not help. See #2434. So for now just dropping the experimental_index_url usage is good enough for now.

PRs dropping that would be welcome.

aignas avatar May 28 '25 14:05 aignas

It is unfortunate that this extra network activity is necessary, even if twine isn't used.

Couple other ideas

Is it possible to have a file-based simple api URL?

Ignore the network failure, just generate some stubs?

It does seem problematic that, if a bzlmod extension does any network activity, it can't work in offline mode

rickeylev avatar May 28 '25 14:05 rickeylev

It does seem problematic that, if a bzlmod extension does any network activity, it can't work in offline mode

One major use case of module extension is to call package manger to resolve dependencies, requiring network for that does make sense to me. The only fix is using the lock file to bypass the resolution, which means for reproducible module extension that requires network access, checking in the lockfile still might be useful. We are generating lockfile for reproducible extensions in https://github.com/bazelbuild/bazel/pull/24757, but not under a place that can be checked in. @fmeum

meteorcloudy avatar May 28 '25 14:05 meteorcloudy

Would https://github.com/bazelbuild/bazel/issues/24777 here? It would definitely help rules_go in a similar case (getting SDK URLs and hashes from just a version identifier).

fmeum avatar May 28 '25 14:05 fmeum

I think it would help with incremental build but if you just clone a repo containing all vendored repos, you can still not build offline due to the module extension resolution?

meteorcloudy avatar May 28 '25 15:05 meteorcloudy

With an up-to-date lockfile checked in you would be since the reproducible data fetched over the network would be persisted in the facts, which are stored in the regular (not the hidden) lockfile.

fmeum avatar May 28 '25 15:05 fmeum

Does it mean that rules_python lock file would affect the root module?

aignas avatar May 28 '25 23:05 aignas

Does it mean that rules_python lock file would affect the root module?

It would result in information being added to the MODULE.bazel.lock file of any module transitively using rules_python. Hopefully that information could be very concise though and, at some point, we might want to suggest users update their lockfiles with a command that doesn't blindly run all extensions, even the ones they never use in their build.

fmeum avatar May 29 '25 07:05 fmeum

This is unrelated to the ticket, but in the past it has been problematic to have MODULE.bazel.lock and test against multiple bazel versions using --lockmode=error (because bazel version affects the lock file), I hope whatever clever solution we come up with here does not have the same issue.

aignas avatar May 29 '25 13:05 aignas

Probably broader than just this issue, but this revealed issues with rules_python and Artifactory mirrors for me. Artifactory doesn't seem to mirror the html index (looking at the python docs this seems to be acceptable?)

WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/nh3/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/idna/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/rich/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/mdurl/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/keyring/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/zipp/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/twine/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/docutils/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/certifi/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pkginfo/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/urllib3/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pygments/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/rfc3986/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/requests/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/markdown-it-py/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/backports-tarfile/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-classes/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/requests-toolbelt/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-context/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/readme-renderer/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/more-itertools/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jeepney/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pycparser/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/jaraco-functools/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/secretstorage/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/pywin32-ctypes/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/cryptography/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/charset-normalizer/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/cffi/ failed: class java.io.FileNotFoundException GET returned 404
WARNING: Download from https://artifactory.company.com/artifactory/pypi/simple/importlib-metadata/ failed: class java.io.FileNotFoundException GET returned 404

calebzulawski avatar Aug 11 '25 14:08 calebzulawski

At dayjob Artifactory works fine with mirroring this. I think there may be an issue somewhere here - maybe our artifactories are configured differently?

aignas avatar Aug 13 '25 13:08 aignas

We realized that #3034 is a duplicate of this issue, so bringing over some details from that thread. In that issue with the reproduction example provided, I was able to find that this is not related to the use of experimental_index_url. It also uncovered that when the initial fetch of these extra dependencies fails, a bazel shutdown is required to recover things. The easiest way to reproduce is to clear the bazel cache, disable internet access, run the command to see the failed build, restore internet access, and rerun the command to see the persisted failure.

engnatha avatar Aug 27 '25 14:08 engnatha