rally
rally copied to clipboard
Pin transitive dependencies
In the Python packaging world, libraries are handled quite differently from applications.
| Libraries | Applications | |
|---|---|---|
| Publish | Libraries are published as wheels on a package index (either a private registry or PyPI) | Private applications are usually published as Docker images before deployment |
| Dependencies | Libraries should not pin their dependencies in order to play well with other libraries | Applications should pin all dependencies, including transitive ones, to make sure that we test the dependencies we ship |
| Tooling | Libraries are expected to use setuptools or hatch | Applications are expected to use Pipenv or Poetry |
However, Rally is in a rough spot, because this is an application that we publish on PyPI like a library. We currently use hatch which was first designed for libraries, but pin our install requirements. This is perfectly fine in my opinion, because publishing to PyPI is very convenient, but we are fighting our tools. (And to be clear, I think this issue is orthogonal from https://github.com/elastic/rally/issues/1420.)
If we were using pip-compile, Pipenv or Poetry to build a private application as a Docker image, then those two operations would be easy:
- upgrading our dependencies (which allows us to get the latest features and not hitting fixed bugs)
- pinning all transitive dependencies (to make sure that when a pull request passes, it's going to work)
It's actually 2. here that prompted this issue. We don't pin urllib3, and #1493 used an import introduced in urllib3 1.26.7 last year. The Rally CI worked fine, using the latest urllib3 version (1.26.9). But our nightly environments had urllib3 1.25.8 installed in 2020 and all pip install --upgrade .[develop] calls kept that version, which caused a benchmark failure because the import was not found. The fix will be to pin urllib3 manually, but that does not scale to all our existing dependencies.
To avoid this problem in the future, how can we pin our dependencies and have a mechanism to update them? There is no widely adopted tool that supports doing that and writing the result to pyproject.toml. Indeed, this authoritative post on setup.py insists that you should never pin dependencies like we do. And since this is seen as a big anti-pattern by everyone working on Python packaging, all proposals to make our use case easier are usually rejected or ignored (flit, pip-tools, setuptools, poetry).
Here's a proposal that solves our issue:
- Declare pinned dependencies in
pyproject.toml - Declare dependencies in a
requirements.infile, leaving most of them abstract (but not the Elasticsearch Python client) - Write or reuse a tool that pins the requirements with pip-tools (which is lighter than Pipenv/Poetry), then reads the pinned requirements and writes them to
setup.cfg/pyproject.toml. We can call it manually from time to time. - While we're at it, develop dependencies and test dependencies can also move to requirements file - this will be abstracted away by
make installanyway
(I also considered using OpenStack pbr - https://docs.openstack.org/pbr/latest/user/index.html but that would force us to keep using setuptools.)
I ran into similar issue a while ago: distributing my application to a cluster that was reconstructed on a daily basis and did not support docker. The easiest solution was using a wheel with all the packages pinned. For this I created poetry-lock-package. One big issue there was: we needed to pin some, but not all packages. Some were part of the proprietary cluster and pinned by the cluster release. That last option is what I would like to add to this issue as a somewhat common feature request.
@bneijt Thank you for your comment and interest! I indeed stumbled on poetry-lock-package when writing up this issue (thanks to a comment of yours). I think that Poetry does too much things however, and am considering moving to flit instead. #1420 will cover shipping a standalone executable.
Some were part of the proprietary cluster and pinned by the cluster release. That last option is what I would like to add to this issue as a somewhat common feature request.
I'm not sure I understand the feature request, are you asking that we don't pin all dependencies? But which ones?