feat/fix(pip): enable local paths for experimental_index_url
Hi rules_python team
I use your rules for a long time now. With WORKSPACE style, I use pip as follows:
pip_parse(
name = "py_deps",
extra_pip_args = [
"--index-url=/home/user/local_pip_mirror",
"--no-cache-dir",
],
python_interpreter = "python3",
python_interpreter_target = interpreter,
requirements_lock = ":requirements_lock.txt",
)
The folder /home/user/local_pip_mirror gets populated with pypi-mirror download --requirement requirements_lock.txt
So I have a nice and clean way to use offline build. I could never get it running directly with bazel fetch and stuff. But this solution was very nice because no internet was involved at all.
I am in the way of upgrading to bzlmod. I saw many bug reports for making offline build work, but I honestly gave up with bazel vendor and stuff again. The simplest way in my opinion is to just use a local pip mirror. For that I tried the following:
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
experimental_index_url = "/home/user/local_pip_mirror",
hub_name = "py_deps",
python_version = python_version,
requirements_lock = "//:requirements_lock.txt",
)
use_repo(pip, "py_deps")
at the moment, experimental_index_url must be a https:// url to work.
This MR changes it so that local paths are also a possible experimental_index_url.
For my project, the proposed changes are in effect and working great.
It is not perfect, but I think it is an important addition to aid the bzlmod migration.
Summary of Changes
Hello @adrianimboden, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a crucial feature that allows the experimental_index_url parameter within rules_python's pip.parse extension to accept local file system paths, in addition to traditional HTTPS URLs. This enhancement is designed to facilitate robust offline dependency resolution by enabling the use of local pip mirrors, which is particularly beneficial for users migrating to bzlmod and seeking a more flexible and reliable method for managing Python dependencies without requiring internet access during the build process.
Highlights
- Local Paths for experimental_index_url: The
experimental_index_urlparameter inpip.parsenow supports local file system paths, allowing users to specify local pip mirrors for package resolution. - Enhanced Offline Build Support for bzlmod: This change significantly improves support for offline builds when using
bzlmodby enabling the use of local pip mirrors, addressing a common challenge for users migrating fromWORKSPACEtobzlmod.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
This looks interesting - we certainly have discussed about using local path for the experimental_index_url in the past. A few things that pop to my mind:
- Having a unit test that ensures that this feature works would be great. They can be added in
tests/pypi/simpleapi_downloaddirectory.
How do you update the local mirror?
I populate the mirror like this:
pypi-mirror download --requirement requirements_lock.txt --download-dir /tmp/download
pypi-mirror create --download-dir /tmp/download --mirror-dir /path/to/mirror --copy
The folder looks like this then:
├── index.html
├── aiohappyeyeballs
│ ├── aiohappyeyeballs-2.6.1-py3-none-any.whl
│ └── index.html
├── aiohttp
│ ├── aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│ └── index.html
├── aiosignal
│ ├── aiosignal-1.4.0-py3-none-any.whl
│ └── index.html
I am not sure about the label stuff. Did you think about something like this?
new_local_repository = use_repo_rule("@bazel_tools//tools/build_defs/repo:local.bzl", "new_local_repository")
new_local_repository(
name = "pip_deps_mirror",
build_file_content = "exports_files(['**'])",
path = "/home/thingdust/deps/pip_deps",
)
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
experimental_index_url = Label("@pip_deps_mirror"),
hub_name = "py_deps",
python_version = python_version,
requirements_lock = "//:requirements_lock.txt",
)
use_repo(pip, "py_deps")
I am not sure how easy that will be. Seems like a complete new codepath to me when there may be an URL or a label. Or do I miss something?
after looking at it again, I saw that there is a much simpler solution to make it better in the meantime.
I wrongly assumed that the download functions don't work for local files. I always had the problem that file:// urls did not work. I found out that this is because the urls get normalized first. A small addition to strip_empty_path_segments makes it work with local urls.
Making it work with labels would be nice tough. Probably for another time?
It did not work before because strip_empty_path_segments does the following: file:///path/to/folder -> file://path/to/folder. For local paths, empty segments should not be a problem I presume.
I really like the idea of being able to point to a local path using a label, for several reasons.
It'd be really convenient for testing our pip integration -- we can easily construct arbitrary index states and have a more end-to-end verification.
It also seems like a really flexible and powerful way for customizing where pip is getting stuff from. You could write a repo rule to make the pip index look however you want, and be populated however you want.
Yes, this sort of thing is great. It's often called a "wheel house" and is a very common and useful pattern for offline builds, avoiding sdist in deployment scenarios, etc. Very supportive of this. Tools in a similarish space are: https://github.com/chriskuehl/dumb-pypi
Thinking out loud a little bit how this could be designed. This might be a train of thought but I'll just right it out as I think.
- The idea of passing in a local path or something sounds good, but so far in bazel I've seen this work only if you pass an absolute path or a label. Hence I thought it would be nice to pass a label.
- If one has labels for each whl file, then we can pass them to
whl_librarywhl_fileattribute: https://rules-python.readthedocs.io/en/latest/api/rules_python/python/private/pypi/whl_library.html#whl_library.whl_file - This means that the code in parse_requirements.bzl needs to inject those labels in some way.
- In the future we may want to write the URLs into the lock file, so if they have absolute
file:///in them, this will not age well, so it is best to treat the local index as one that has the right format. parse_requirements.bzlis called fromhub_builderand gets theget_index_urlsfunction as a parameter. We could have a separate implementation of that that returns labels instead of URLs, however, the label mapping should be present there is some way.- If
pip.parsecan create a local index repository on the fly (i.e. repository where we can access whls by using a scheme of@local_index_repo_name//<whl_name>:<file_name>.whl). The extension reads the local directory structure and finds all whl files, then creates a repo and passes the whls as a list of paths/labels. The HTML files are only processed in the extension to avoid the circular dependencies in the extension/starlark evaluation.
So to sum up, the files that would need to be touched:
whl_library- stays the same.hub_builder.bzl- needs some extra handling of a differentget_index_urlsfunction. It should handle the case well where thewhl(ordist) struct haswhl_filebut does not haveurlset.parse_requirements.bzl- needs some minor fixing to accommodate a more generic getting of the wheels.local_whl_repo.bzl- a new repository that contains the files.simpleapi_local.bzl- a new file that handles the traversing the local index.html tree.
There are probably ways to optimize this approach.