rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

feat/fix(pip): enable local paths for experimental_index_url

Open adrianimboden opened this issue 2 months ago • 8 comments

Hi rules_python team

I use your rules for a long time now. With WORKSPACE style, I use pip as follows:

pip_parse(
    name = "py_deps",
    extra_pip_args = [
        "--index-url=/home/user/local_pip_mirror",
        "--no-cache-dir",
    ],
    python_interpreter = "python3",
    python_interpreter_target = interpreter,
    requirements_lock = ":requirements_lock.txt",
)

The folder /home/user/local_pip_mirror gets populated with pypi-mirror download --requirement requirements_lock.txt

So I have a nice and clean way to use offline build. I could never get it running directly with bazel fetch and stuff. But this solution was very nice because no internet was involved at all.

I am in the way of upgrading to bzlmod. I saw many bug reports for making offline build work, but I honestly gave up with bazel vendor and stuff again. The simplest way in my opinion is to just use a local pip mirror. For that I tried the following:

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(         
    experimental_index_url = "/home/user/local_pip_mirror",
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)                        
use_repo(pip, "py_deps")

at the moment, experimental_index_url must be a https:// url to work.

This MR changes it so that local paths are also a possible experimental_index_url.

For my project, the proposed changes are in effect and working great.

It is not perfect, but I think it is an important addition to aid the bzlmod migration.

adrianimboden avatar Oct 02 '25 01:10 adrianimboden

Summary of Changes

Hello @adrianimboden, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature that allows the experimental_index_url parameter within rules_python's pip.parse extension to accept local file system paths, in addition to traditional HTTPS URLs. This enhancement is designed to facilitate robust offline dependency resolution by enabling the use of local pip mirrors, which is particularly beneficial for users migrating to bzlmod and seeking a more flexible and reliable method for managing Python dependencies without requiring internet access during the build process.

Highlights

  • Local Paths for experimental_index_url: The experimental_index_url parameter in pip.parse now supports local file system paths, allowing users to specify local pip mirrors for package resolution.
  • Enhanced Offline Build Support for bzlmod: This change significantly improves support for offline builds when using bzlmod by enabling the use of local pip mirrors, addressing a common challenge for users migrating from WORKSPACE to bzlmod.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Oct 02 '25 01:10 gemini-code-assist[bot]

This looks interesting - we certainly have discussed about using local path for the experimental_index_url in the past. A few things that pop to my mind:

  • Having a unit test that ensures that this feature works would be great. They can be added in tests/pypi/simpleapi_download directory.

How do you update the local mirror?

aignas avatar Oct 02 '25 04:10 aignas

I populate the mirror like this:

pypi-mirror download --requirement requirements_lock.txt --download-dir /tmp/download
pypi-mirror create --download-dir /tmp/download --mirror-dir /path/to/mirror --copy

The folder looks like this then:

├── index.html
├── aiohappyeyeballs
│   ├── aiohappyeyeballs-2.6.1-py3-none-any.whl
│   └── index.html
├── aiohttp
│   ├── aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│   └── index.html
├── aiosignal
│   ├── aiosignal-1.4.0-py3-none-any.whl
│   └── index.html

I am not sure about the label stuff. Did you think about something like this?

new_local_repository = use_repo_rule("@bazel_tools//tools/build_defs/repo:local.bzl", "new_local_repository")

new_local_repository(
    name = "pip_deps_mirror",
    build_file_content = "exports_files(['**'])",
    path = "/home/thingdust/deps/pip_deps",
)

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
    experimental_index_url = Label("@pip_deps_mirror"),
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)
use_repo(pip, "py_deps")

I am not sure how easy that will be. Seems like a complete new codepath to me when there may be an URL or a label. Or do I miss something?

adrianimboden avatar Oct 02 '25 08:10 adrianimboden

after looking at it again, I saw that there is a much simpler solution to make it better in the meantime.

I wrongly assumed that the download functions don't work for local files. I always had the problem that file:// urls did not work. I found out that this is because the urls get normalized first. A small addition to strip_empty_path_segments makes it work with local urls.

Making it work with labels would be nice tough. Probably for another time?

adrianimboden avatar Oct 02 '25 08:10 adrianimboden

It did not work before because strip_empty_path_segments does the following: file:///path/to/folder -> file://path/to/folder. For local paths, empty segments should not be a problem I presume.

adrianimboden avatar Oct 02 '25 08:10 adrianimboden

I really like the idea of being able to point to a local path using a label, for several reasons.

It'd be really convenient for testing our pip integration -- we can easily construct arbitrary index states and have a more end-to-end verification.

It also seems like a really flexible and powerful way for customizing where pip is getting stuff from. You could write a repo rule to make the pip index look however you want, and be populated however you want.

rickeylev avatar Oct 03 '25 03:10 rickeylev

Yes, this sort of thing is great. It's often called a "wheel house" and is a very common and useful pattern for offline builds, avoiding sdist in deployment scenarios, etc. Very supportive of this. Tools in a similarish space are: https://github.com/chriskuehl/dumb-pypi

groodt avatar Oct 03 '25 04:10 groodt

Thinking out loud a little bit how this could be designed. This might be a train of thought but I'll just right it out as I think.

  1. The idea of passing in a local path or something sounds good, but so far in bazel I've seen this work only if you pass an absolute path or a label. Hence I thought it would be nice to pass a label.
  2. If one has labels for each whl file, then we can pass them to whl_library whl_file attribute: https://rules-python.readthedocs.io/en/latest/api/rules_python/python/private/pypi/whl_library.html#whl_library.whl_file
  3. This means that the code in parse_requirements.bzl needs to inject those labels in some way.
  4. In the future we may want to write the URLs into the lock file, so if they have absolute file:/// in them, this will not age well, so it is best to treat the local index as one that has the right format.
  5. parse_requirements.bzl is called from hub_builder and gets the get_index_urls function as a parameter. We could have a separate implementation of that that returns labels instead of URLs, however, the label mapping should be present there is some way.
  6. If pip.parse can create a local index repository on the fly (i.e. repository where we can access whls by using a scheme of @local_index_repo_name//<whl_name>:<file_name>.whl). The extension reads the local directory structure and finds all whl files, then creates a repo and passes the whls as a list of paths/labels. The HTML files are only processed in the extension to avoid the circular dependencies in the extension/starlark evaluation.

So to sum up, the files that would need to be touched:

  • whl_library - stays the same.
  • hub_builder.bzl - needs some extra handling of a different get_index_urls function. It should handle the case well where the whl (or dist) struct has whl_file but does not have url set.
  • parse_requirements.bzl - needs some minor fixing to accommodate a more generic getting of the wheels.
  • local_whl_repo.bzl - a new repository that contains the files.
  • simpleapi_local.bzl - a new file that handles the traversing the local index.html tree.

There are probably ways to optimize this approach.

aignas avatar Oct 05 '25 04:10 aignas