data.gov
data.gov copied to clipboard
CKAN pip requirements management
User Story
In order to have a native python module management system and locking system, data.gov sysadmin wants to implement pipenv to manage python dependencies for catalog and inventory applications.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
- [ ] GIVEN a python module is ready to be upgraded
WHENmake update-dependencies
is run
THEN the python module is upgraded and locked at the latest possible version
AND cloud.gov can build and implement the locked dependencies
Background
The custom way that the python modules is managed (requirements.txt and requirements.in files) makes it difficult to manage with 3rd party scanning systems; they create PR's that need editing before they can be merged. We want to move to a more standard approach. It has been mentioned that pipenv does not work with CKAN, though there is no evidence of this. There may be some blockers to making this work.
Security Considerations (required)
None
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]
Some deeper discussion on pipenv vs poetry here: https://news.ycombinator.com/item?id=26093926
Poetry also helps with publishing to PyPi FWIW.
EDIT: there's also lightweight alternative pip-tools > https://github.com/jazzband/pip-tools
I'm not sure we can use pipenv
in isolation. As far as I know, to satisfy the cloud.gov need to vendor dependencies, pip is the only tool that does this..
Well, I take back the "only tool" part. Poetry supports downloading them, but I'm still not sure this is compatible with cloud.gov
- https://github.com/python-poetry/poetry/issues/2184
I wanted to wait until I had the absolute final decision to post it here (since I'm not hopeful of the path I'm taking now), but the intermediary answer is that pipenv
is very, very ... very slow. Using it for all of our requirements management is not practical. It triples or quadruples the build time of ckan in docker and the locking process takes 10 to 30 minutes which is not realistic when updating dependencies. These results are when using the latest version 2022.11.11
in docker. Using an older version 2021.5.29
locally is some form of faster.
I pushed https://github.com/GSA/catalog.data.gov/pull/657/commits/c7ee555bbb160bdb71b60625caff6a29832b86a0 that would use pip
for all local testing and then pipenv
for "vendoring" of cloud.gov. It is a solution, but it would still leave the (minimum) 10 to 30 mins to update-dependencies which I still think is impractical.
I believe it's the decision of the team currently that the benefits of pipenv
do not outweigh the drawbacks. So I'm closing this issue. All of the work has been documented in the PR above.
I wonder how they're achieving this benchmarking if that's the case. 34s! Their requirements.txt is just a bit smaller than ours. https://lincolnloop.github.io/python-package-manager-shootout/
I'd be interested in that as well. 🦑