data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

CKAN pip requirements management

Open jbrown-xentity opened this issue 3 years ago • 1 comments

User Story

In order to have a native python module management system and locking system, data.gov sysadmin wants to implement pipenv to manage python dependencies for catalog and inventory applications.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • [ ] GIVEN a python module is ready to be upgraded
    WHEN make update-dependencies is run
    THEN the python module is upgraded and locked at the latest possible version
    AND cloud.gov can build and implement the locked dependencies

Background

The custom way that the python modules is managed (requirements.txt and requirements.in files) makes it difficult to manage with 3rd party scanning systems; they create PR's that need editing before they can be merged. We want to move to a more standard approach. It has been mentioned that pipenv does not work with CKAN, though there is no evidence of this. There may be some blockers to making this work.

Security Considerations (required)

None

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

jbrown-xentity avatar Sep 08 '21 14:09 jbrown-xentity

Some deeper discussion on pipenv vs poetry here: https://news.ycombinator.com/item?id=26093926

Poetry also helps with publishing to PyPi FWIW.

EDIT: there's also lightweight alternative pip-tools > https://github.com/jazzband/pip-tools

btylerburton avatar Jul 06 '22 15:07 btylerburton

I'm not sure we can use pipenv in isolation. As far as I know, to satisfy the cloud.gov need to vendor dependencies, pip is the only tool that does this..

nickumia-reisys avatar Nov 14 '22 21:11 nickumia-reisys

Well, I take back the "only tool" part. Poetry supports downloading them, but I'm still not sure this is compatible with cloud.gov

  • https://github.com/python-poetry/poetry/issues/2184

nickumia-reisys avatar Nov 14 '22 21:11 nickumia-reisys

I wanted to wait until I had the absolute final decision to post it here (since I'm not hopeful of the path I'm taking now), but the intermediary answer is that pipenv is very, very ... very slow. Using it for all of our requirements management is not practical. It triples or quadruples the build time of ckan in docker and the locking process takes 10 to 30 minutes which is not realistic when updating dependencies. These results are when using the latest version 2022.11.11 in docker. Using an older version 2021.5.29 locally is some form of faster.

I pushed https://github.com/GSA/catalog.data.gov/pull/657/commits/c7ee555bbb160bdb71b60625caff6a29832b86a0 that would use pip for all local testing and then pipenv for "vendoring" of cloud.gov. It is a solution, but it would still leave the (minimum) 10 to 30 mins to update-dependencies which I still think is impractical.

I believe it's the decision of the team currently that the benefits of pipenv do not outweigh the drawbacks. So I'm closing this issue. All of the work has been documented in the PR above.

nickumia-reisys avatar Nov 17 '22 15:11 nickumia-reisys

I wonder how they're achieving this benchmarking if that's the case. 34s! Their requirements.txt is just a bit smaller than ours. https://lincolnloop.github.io/python-package-manager-shootout/

btylerburton avatar Nov 17 '22 15:11 btylerburton

I'd be interested in that as well. 🦑

nickumia-reisys avatar Nov 17 '22 15:11 nickumia-reisys