rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

[Proposal] Supporting custom index urls

Open caseyduquettesc opened this issue 3 years ago • 6 comments

🚀 feature request

Relevant Rules

Existing rules:

  • pip_install
  • pip_parse

Description

RE: https://github.com/bazelbuild/rules_python/issues/74, but only covers customizing the index urls.

We'd like to see support for customizing the index urls used to locate packages through the rule attributes. We're willing to contribute the PR for this, but we'd like to see community agreement on the interface and whether such a change would be accepted.

The problem is that pip will only search the default index for packages unless one of the workarounds below is employed. Many enterprises depend on a private index to distribute internal libraries, meaning that these enterprises are left to use one of the workarounds, which is difficult from a dev-ops perspective to instruct projects to use. Having a documented and prescribed method to configure this makes self-discovery possible and makes macros easier to write that preset the index urls.

This does not cover authentication because the bazel rules I have seen so far have centered their authentication around the .netrc file, which I think is a good idea. pip already supports authenticating from a .netrc file and since rules_python delegates package discovery/installation to pip, this works out great.

Describe the solution you'd like

Rules that interact with an index should accept urls to use. If not specified, the default behavior will mimic pip and use https://pypi.org/simple. If specified, it will be absolute, which means users will need to include the default index if they want to add their own. This will permit not using the public index at all, which is sometimes desirable if you want dependency requests to go through a virtual index.

Proposed interface to set the index urls:

pip_install(
    ...
    index_urls = [
        "https://private.domain/artifactory/my-repository",
        # This could be exported as a constant
        "https://pypi.org/simple",
    ]
)
pip_parse(
    index_urls = [
        "https://private.domain/artifactory/my-repository",
        "https://pypi.org/simple",
    ]
)

Describe alternatives you've considered

Alternative 1 Consider the following:

index_urls = [],
extra_index_urls = [
    "https://private.domain/artifactory/my-repository",
]

This maps closer to the pip options, however the difference between the two options when looking at the rule form is less clear, especially to someone who doesn't already use pip.

Additionally, running queries becomes more complex because you'd now have to query two attributes to find all possible dependency sources, which is a legitimate use case for our security team.

Workaround 1 One way of installing dependencies from a private index is by re-using an existing configuration in your ~/.pip/pip.conf file. However, this requires every bazel command gets invoked with --action_env=PIP_CONFIG_FILE=$HOME/.pip/pip.conf. This is not only burdensome to remember, but it also is incompatible with a remote build cache. The unique environment variable makes sharing a build cache impossible.

This workaround also hides the index urls being used from the configuration, which is not hermetic.

Workaround 2 Another way to install dependencies from a private index today, is by including --extra-index-url https://my.domain/pypi-local/simple in the requirements.in (or .txt). This has better ergonomics than workaround 1, but still suffers from not being queryable like it is in rules_jvm_external. It also can't automatically be supplied via macro if a company wanted to wrap the rules_python rules to inject company-wide defaults.

caseyduquettesc avatar Jul 07 '21 02:07 caseyduquettesc

I love this proposal. In my environment all artifacts must be scanned and vetted before deployment to production. Being able to specify where our Python projects pull it's external dependencies from gives me and my security team the ability to leverage the tools & automation from our repository applications in a more fluid way.

I personally like the initial solution example. If the index_urls attr were left blank have it default to "pypi.org". Otherwise, only query the sources specified within index_urls:

pip_install(
    ...
    index_urls = [
        "https://private.domain/artifactory/my-repository",
        # This could be exported as a constant
        "https://pypi.org/simple",
    ]
)

ajrpeggio avatar Jul 07 '21 18:07 ajrpeggio

Preciso muito disso.

brduarte avatar Jul 15 '21 18:07 brduarte

Did you try using extra_pip_args to provide the custom parameters for pip? It seems to be working well https://github.com/bazelbuild/rules_python/blob/main/examples/pip_install/WORKSPACE#L16

pip_install(
  ...
  extra_pip_args = [
    "--index-url", "<your-custom-urls>",
    "--extra-index-url", "<your-extra-index-urls>",
  ],
)

Also could you please explain how could Workaround 1 work? As in the latest version pip is running with --isolated which would ignore those variables.

havasd avatar Aug 22 '21 12:08 havasd

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Feb 18 '22 22:02 github-actions[bot]

This issue is nearly all I need. What I can't figure out is how to securely pass user authentication into the index url?

For example:

pip_install(
  ...
  extra_pip_args = [
    "--index-url", "https://{}:{}@my.pypi.repo/simple".format(username, password),
  ],
)

Usually I'd like to do this with an environment variable. I know I can write a .bzl file repository rule to export the value... but I can't figure out how to access the string in the WORKSPACE file?

Any ideas? Help very much appreciated.

bolbken avatar Mar 17 '22 21:03 bolbken

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Sep 13 '22 22:09 github-actions[bot]

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar Oct 13 '22 22:10 github-actions[bot]