rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

Standardise on pip_parse

Open groodt opened this issue 1 year ago • 12 comments

Less extreme version of the proposal outlined here: https://github.com/bazelbuild/rules_python/pull/757


What is changing?

This PR standardises on pip_parse behaviour as the only mechanism for installation of 3rd-party python packages. The internals of pip_install have been changed to use pip_parse behind the scenes, with a small compatibility shim over rule attribute names to ensure minimal disruption to any existing users of pip_install.

Why is it changing?

Historically, we had 2 rules to install packages from PyPI (pip_install and pip_parse), with pip_install being the original. It worked well, but had a significant flaw in that it would eagerly fetch and install all packages. This was slow and frustrating when there were a large number of dependencies, so pip_parse was born!

pip_parse (even though it is still a repository rule) lazily fetched and installed packages. This was adopted by most of the community based on evidence from issues raised, the bazel Slack channels and the rule maintainers.

Maintaining both versions should no longer be necessary. There is a lot of duplication in both versions and it requires extra effort and time. This increases maintenance burden and significantly slows progress to improve the rules.

Breaking Changes:

  • A small breaking change in this PR, will be for users (probably a small number) who still use a pip_install version prior to this PR and who DO NOT use the requirement() macro to reference dependencies. The naming style of the external repositories has changed, so this will require a sed across any .bazel files if these users upgrade. This isn't a big deal in my opinion.
  • Users of pip_install will need to add the calls as you do with pip_parse e.g.
load("@pip//:requirements.bzl", "install_deps")

install_deps()

groodt avatar Aug 27 '22 11:08 groodt

@f0rmiga for early review before opening up to other maintainers.

groodt avatar Aug 29 '22 03:08 groodt

The breaking change is acceptable. Gazelle can produce the right label for users who rely on it. I have not reviewed the code yet, thought. Will do next.

f0rmiga avatar Aug 30 '22 01:08 f0rmiga

Any further comments @f0rmiga ?

groodt avatar Sep 09 '22 20:09 groodt

Will users run into a problem with passing uncompiled requirements to pip_parse? Do you get an error if your requirements.txt just contains "pep8" or something?

alexeagle avatar Sep 13 '22 22:09 alexeagle

Will users run into a problem with passing uncompiled requirements to pip_parse? Do you get an error if your requirements.txt just contains "pep8" or something?

Yes, folks will get a clear error message instructing them what to do:

The `requirements_lock` file must be fully pinned. See `compile_pip_requirements`.
Alternatively, use `pip-tools` or a similar mechanism to produce a pinned lockfile.

The following requirements were not pinned:

groodt avatar Sep 13 '22 23:09 groodt

Yes, folks will get a clear error message instructing them what to do:

The requirements_lock file must be fully pinned. See compile_pip_requirements. Alternatively, use pip-tools or a similar mechanism to produce a pinned lockfile.

The following requirements were not pinned:

The pinned-ness isn't the only issue. pip_parse does an intransitive resolve. So its almost certain a requirements.txt that was used with pip_install won't work with pip_parse, if pip_install is an alias of pip_parse.

I'm not necessarily opposed to the change but it is as breaking as the original proposal AFAICT.

hrfuller avatar Sep 16 '22 18:09 hrfuller

So its almost certain a requirements.txt that was used with pip_install won't work with pip_parse, if pip_install is an alias of pip_parse.

Can you explain what you mean? This is literally what we did at $dayjob and it worked fine for us. The recommendation has always been to use a fully pinned requirements.txt and I was the person who added the checks for it in pip_parse.

It's unlikely somebody will have a fully-pinned, but not fully-locked requirements.txt and we can easily add that to the release notes or error message. Whats your recommendation?

groodt avatar Sep 16 '22 20:09 groodt

My point has to do with fully-transitively resolved and pinned lock file (what we expect in pip_parse) vs a fully-pinned requirements.txt that only has top level distributions. pip_install would run pip to resolve the transitive set, where pip_parse expects the requirements file to already be fully transitively resolved, hence the compile-pip-requirements thing. See https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/extract_wheels/extract_single_wheel.py#L41 --no-deps is the difference.

My recommendation is that we consider that this change will not be transparent to most pip_install users unless we also plumb up some compile_pip_requirements to convert the top level requirements.txt into a real requirements_lock.txt. Builds will fail at analysis time with undefined labels pointing to other external repos for transitive deps otherwise.

hrfuller avatar Sep 16 '22 20:09 hrfuller

a fully-pinned requirements.txt that only has top level distributions

Got it. Do you expect those to be common? It's quite unlikely those individuals will have a good bazel experience because their results would be non-deterministic as resolution results would change when different package versions are published (of the non-direct dependencies). I would go so far as to say that it's a bug if pip_install doesn't require a transitively locked file at the moment.

compile-pip-requirements isn't only for pip_parse. It was added as a convenience for both pip_parse and pip_install.

Seems like a possible, but unlikely (and generally unsupportable?) situation to me.

groodt avatar Sep 16 '22 21:09 groodt

Is the suggestion you are making @hrfuller to rather NOT try to reuse the existing requirements.txt (even though many will be transitively locked)? I can get behind that. I don't mind. I just thought it could help users with existing requirements.txt that are locked (which has also been the recommendation and most should have or they would see lots of cache misses and non-determinism).

groodt avatar Sep 16 '22 21:09 groodt

Is the suggestion you are making @hrfuller to rather NOT try to reuse the existing requirements.txt

Not really, just pointing out that the current workaround won't work for some users. Maybe we should just document that fact and try to give the nice helpful message about compile-pip-requirements to them, and move on. I'm sure it's not common at large companies to use an unlocked requirements file, but it is much more tenable for small repos.

hrfuller avatar Sep 17 '22 00:09 hrfuller

Maybe we should just document that fact

@hrfuller Ok, that works for me. Happy if I just add a note to the deprecation paragraph in README.md?

groodt avatar Sep 20 '22 08:09 groodt

@hrfuller I've added a small note to the README. LGTY?

The maintainers have taken all reasonable efforts to faciliate a smooth transition, but some users of `pip_install` will
need to replace their existing `requirements.txt` with a fully resolved set of dependencies using a tool such as
`pip-tools` or the `compile_pip_requirements` repository rule.

groodt avatar Sep 22 '22 23:09 groodt

Cool, @groodt we should make sure the next release includes a breaking change section in the release notes as well to make this discoverable

alexeagle avatar Sep 27 '22 00:09 alexeagle