renovate icon indicating copy to clipboard operation
renovate copied to clipboard

Feature request: pip-compile support

Open bullfest opened this issue 6 years ago • 57 comments

Cool project!

It would be nice with support for the pip-tools pip-compile command (basically creating lock files with pip dependencies from specification files)

Proposed solution If it's possible a solution could be to simply allow defining arbitrary lock-file commands if containerization is good enough so that it doesn't pose a security issue (haven't looked at any of the code, so no idea of how the project currently works).

Otherwise something like

"pipCompile": {
  "enabled": true 
   "inFile": "requirements.in"
   "outFile": "requirements.txt"
}

would probably be a good configuration that runs the command pip-compile [--outputfile <outFile>] [<inFile>].

bullfest avatar Aug 01 '18 09:08 bullfest

Interesting. This would probably correspond to our concept of “lock file maintenance” but in this case we consider requirements.in to be the “package file” while requirements.txt functions as the lock file. In such situations you’d also want to avoid Renovate updating the requirements.txt regardless because updating each to the latest version might not work

rarkins avatar Aug 01 '18 15:08 rarkins

Yes, but as requirements.in essentially is a requirements.txt but with a different name one could simply change the fileMatch field in pip_requirements to something like ["^requirements.in$"], or am I missing something?

Also reasonable default values for the conf-dict would probably be

"pipCompile": {
  "enabled": false 
   "inFile": ""
   "outFile": ""
}

Enabling the tool without specifying in/out files would then result in the tool being run without any arguments/flags.

bullfest avatar Aug 01 '18 16:08 bullfest

I think I’d define this as a new manager called pip_compile that has a default match for requirements.in instead of requirements.txt. Output file can be generated by replacing .in with .txt. More advanced renaming can be deferred.

Majority of the other logic would be calling functions in pip_requirements, which you’d probably leave disabled.

rarkins avatar Aug 01 '18 19:08 rarkins

(We're using dependabot but we'd like options, and renovate has come highly recommended...) We use pip-compile, and currently our only choice in this space (I think?) is Dependabot.

Output file can be generated by replacing .in with .txt. More advanced renaming can be deferred.

It's (much) more complicated than that - .txt is a lockfile in this scenario, it contains the whole dependency tree, not just the direct dependencies.

Essentially, you change the .in file as required, then run pip-compile over it to generate the .txt file. For a usefully limited PR for package X, you'd probably want to then filter the .txt file diff to limit the PR to just the dependencies of package X, because pip-compile will otherwise update everything at once.

A common pip-compile workflow is to have multiple pairs of files:

requirements/
    base.in
    base.txt
    local.in
    local.txt
    production.in
    production.txt
    test.in
    test.txt

... which have to be compiled in a specific order. production.in lists only production dependencies, and includes a line that says -c base.txt to depend on base.txt (not an .in file!)

So when updating a dependency mentioned in base.in, you'd want to first compile base.in to produce base.txt, and then compile production.in to produce production.txt.

pip-compile has recently added direct support for dependent requirements via the -c option: https://github.com/jazzband/pip-tools#workflow-for-layered-requirements

This workflow is described in the first 1/3 of https://jamescooke.info/a-successful-pip-tools-workflow-for-managing-python-package-requirements.html

Our current addition to this for our tooling is the in.list file which just contains a list of .in files in the right compile order. We made that bit up; not sure what other orgs do.

craigds avatar Jan 19 '20 20:01 craigds

@craigds thank you for the detailed description.

For a usefully limited PR for package X, you'd probably want to then filter the .txt file diff to limit the PR to just the dependencies of package X, because pip-compile will otherwise update everything at once.

Could this be achieved using pip-compile --upgrade-package X==2.0.0?

Regarding ordering, I'm thinking that Renovate could determine the required ordering by parsing/understanding the -c lines at the time of extraction. It would then build a directed graph and update the files in the required order.

rarkins avatar Jan 20 '20 16:01 rarkins

It would be great if you or anyone in this thread could build up an example repo along the lines described:

requirements/
    base.in
    base.txt
    local.in
    local.txt
    production.in
    production.txt
    test.in
    test.txt

rarkins avatar Jan 20 '20 16:01 rarkins

Could this be achieved using pip-compile --upgrade-package X==2.0.0?

Looks like it, yes, though I haven't used that option myself.

Regarding ordering, I'm thinking that Renovate could determine the required ordering by parsing/understanding the -c lines at the time of extraction. It would then build a directed graph and update the files in the required order.

yes, that'd be amazing :+1: note that -r is common instead of -c; I think the difference is that -c allows packages to appear multiple times in different files whereas -r doesn't. But it should probably handle both to build the directed graph.

craigds avatar Jan 20 '20 20:01 craigds

It would be great if you or anyone in this thread could build up an example repo along the lines described:

@rarkins see https://github.com/twslade/renovatebot-pip-tools-example

In the directory structure you listed above, there is a base.txt which is usually not necessary since you can create the *.txt with a command like:

pip-compile --output-file local.txt base.in local.in

twslade avatar Jan 30 '20 18:01 twslade

In the directory structure you listed above, there is a base.txt which is usually not necessary since you can create the *.txt with a command like: pip-compile --output-file local.txt base.in local.in

I suppose... The problem with that is:

  1. that doesn't allow the bot to easily figure out what the dependencies are. I guess you'd have to parse the comments at the top of the .txt files to figure out how it was invoked.
  2. It's possible to override that comment though, using CUSTOM_COMPILE_COMMAND=update-dependencies in the environment. We have our own wrapper script for dev use so we use this to make sure people do the right thing locally.
  3. Including .in files directly means you're not necessarily freezing the same versions between the various environments. If the index changed between multiple invocations of pip-compile, you'd get different versions. So we deliberately included base.txt rather than base.in as mentioned in the article I linked above

craigds avatar Feb 03 '20 04:02 craigds

But I guess it'd be great if renovatebot could handle both styles :)

craigds avatar Feb 03 '20 04:02 craigds

Thanks, I was thinking the same thing. If the ordering of execution matters then it's essential that there is a standardized way for the bot to be able to extract and determine that. The -r and -c options within files seemed to solve that nicely. Is there any reason why people who want this plus Renovate couldn't move to that approach?

rarkins avatar Feb 03 '20 04:02 rarkins

+1

We plan to use pip-tools managing dependencies only required for development (not in deployment) or not directly required (required by packages we actually use), so we have to turn off renovatebot.

buko106 avatar Jun 03 '20 01:06 buko106

I think this issue should be labeled with python, right? (So that people can find it when coming from https://docs.renovatebot.com/python/#future-work )

karfau avatar Aug 28 '20 09:08 karfau

Hmm, we took out language labels for now to resize how many labels we had in the repo. Need to think whether to reintroduce them or remove that doc link

rarkins avatar Aug 28 '20 09:08 rarkins

Thanks, I was thinking the same thing. If the ordering of execution matters then it's essential that there is a standardized way for the bot to be able to extract and determine that. The -r and -c options within files seemed to solve that nicely. Is there any reason why people who want this plus Renovate couldn't move to that approach?

no reason, no. We used a list file just because it was trivial for us to implement, but renovate can be smarter than that.

rarkins added this to Done in Renovate on Jun 18

not sure how to interpret this; this ticket definitely seems not-done :)

craigds avatar Sep 20 '20 22:09 craigds

pip-compile is more and more popular in the current python world.

We are eager for this in renovate.

tata9001 avatar May 24 '21 00:05 tata9001

Renovate has evolved a bit since this was originally requested, so I wanted to recap.

  1. I think by default we can have fileMatch look for requirements/*.in or requirements.in. Anything wider would introduce a lot of false positives
  2. Users can easily customize this by manually configuring fileMatch
  3. We could have our existing pip_requirements manager look if a matching requirements.in file exists and if so then skip extracting the requirements.txt. This will reduce the number of "wrong" pip_requirements hits out of the box
  4. For pip_compile, we'll treat the *.txt file as an "artifact" of the corresponding *.in file
  5. Initially at least, we'd assume a 1:1 mapping between file.in and file.txt and only extract files with matching name.txt
  6. We use -c and -r to determine the order of files
  7. If files/dependencies need to be upgraded together then it's up to the user to apply grouping themselves

rarkins avatar May 24 '21 07:05 rarkins

Hello @rarkins! We also want pip-compile support in renovate to finally move from dependabot. Just a few additional notes from our a bit more advanced workflow and a few questions about future implementation to ensure it works great from the beginning:

For pip_compile, we'll treat the *.txt file as an "artifact" of the corresponding *.in file

What if the input files are also named .txt, will it be possible to have both input and output files to be of .txt extension, or should we change that in our project?

Initially at least, we'd assume a 1:1 mapping between file.in and file.txt and only extract files with matching name.txt

Maybe some parameter like output_directory could be added, or some regex to transform input files to output files. And also, python dependencies differ by version, so generating output files in a different version may break the app or the dependencies, or cause other unexpected effects. I guess it should be made possible to specify which python version to use. Also pip-compile has option --generate-hashes, maybe it should be allowed to provide custom flags to the compile command As a complex example, we use pip-compile to generate deterministic requirements for our build images. We have input files in requirements folder, and the corresponding deterministic files are generated in the requirements/deterministic directory. We specify exact python version series (3.7 for now), and change it once in a while when upgrading the deployment docker images. It is for now generated by this script to avoid the issues with different python versions I mentioned above.

Our workflow is pretty complex, but I wanted to give an additional example for the implementation to be better. Let me know if I can help with moving this forward, I am not so good in JS/TS but can assist with Python-related questions. Thank you

MrNaif2018 avatar Jun 09 '21 09:06 MrNaif2018

What if the input files are also named .txt, will it be possible to have both input and output files to be of .txt extension, or should we change that in our project?

How/where is the relationship between input.txt and output.txt defined within the repo? e.g. are these today completely arbitrary and you put them into proprietary build scripts, or is there a convention for how/where to define the mapping?

Update: Seems like you have a proprietary build script which is linked to.

I guess it should be made possible to specify which python version to use.

This is already done in Renovate e.g. for Poetry and Pipenv. Ideally there would be a convention within the repository for defining the required Python version, but failing that we do have the ability to configure it.

Also pip-compile has option --generate-hashes, maybe it should be allowed to provide custom flags to the compile command

Ideally this is also specified within the repository. However wouldn't it be possible to use the logic "if the existing output file has hashes, then use --generate-hashes when generating the updated output file"?

Our workflow is pretty complex, but I wanted to give an additional example for the implementation to be better. Let me know if I can help with moving this forward, I am not so good in JS/TS but can assist with Python-related questions.

The biggest barrier to starting this is the large number of edge cases and advanced uses listed in this issue. This discourages anyone from starting implementation because it gives the impression "don't even try this unless you've got a few weeks to think through all the cases listed here". If someone can define what a minimum viable implementation looks like then it would help overcome that barrier. I think if we had a base implementation which satisfied 80%+ of use cases then it would also make it easier to break down the remaining 10-20% of use cases into more "bite sized" chunks of functionality which can be implemented one by one.

By the way this also illustrates the downside of package managers which are high on flexibility and/or low on convention. Everyone does things a different ways, needs to have custom shell scripts, etc.

rarkins avatar Jun 09 '21 09:06 rarkins

How/where is the relationship between input.txt and output.txt defined within the repo? e.g. are these today completely arbitrary and you put them into proprietary build scripts, or is there a convention for how/where to define the mapping?

Well for now it is in that script, I thought that with renovate it could possibly be moved into it's config, some conversion function or like so.

Ideally this is also specified within the repository. However wouldn't it be possible to use the logic "if the existing output file has hashes, then use --generate-hashes when generating the updated output file"?

Yes, in all examples I meant that those configuration options are specified within the repository in renovate config. That logic seems to be fine

About the pretty complex cases: I agree, but just wanted to provide some examples of such.

As far as I understand this isn't yet started, right? Maybe I could try to tackle this, even though my knowledge of JS/TS is limited. Not guaranteeing anything yet. Where can I start and what should I take in mind? Should it possibly re-use some or most of the features from the pip_requirements package manager?

MrNaif2018 avatar Jun 09 '21 10:06 MrNaif2018

@MrNaif2018 that's awesome! To save you some time, I have added very basic support in #10377, which I hope we can merge right away.

When we do that merge, we'll close this issue, so then I request everyone interested to ensure that the remaining use cases/requirements are captured into separate feature request issues so that they can discussed and implemented one by one.

For example if the output file needs to be configurable, perhaps we can define a convention for a comment which can be put in the "input" file, e.g.

# output-file: output/requirements-dev.txt

Other "missing" functionality to be implemented later:

  • A default fileMatch value. I wanted to keep it empty for now so that it's entirely opt-in
  • Ordering of file updates/compilations

rarkins avatar Jun 09 '21 20:06 rarkins

Thank you very much for your quick reply! I will set up and try renovate on our python repos to see how it works for now. If needed I will open tickets about missing functionality with suggestions and/or implementation. About configurable output file, I still think of it as a "transform function" transform(input_file) -> output_file For example for our workflow in python it would be like so:

def transform(input_file):
     return input_file.replace("requirements/","requirements/deterministic/")

That got me thinking that probably a sed-like regex string would work: s/requirements/requirements\/deterministic/ It could be included in config files without any problem, and it's syntax is powerful enough to handle any input/output convention I think. Maybe it could be combined with the output-file comment idea

Just my ideas, but those are for separate issues of course

MrNaif2018 avatar Jun 09 '21 21:06 MrNaif2018

:tada: This issue has been resolved in version 25.39.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

renovate-release avatar Jun 11 '21 11:06 renovate-release

Super thanks for this awesome tool! Your team's support is so helpful and in time.

When you have time, please sync this feature to https://github.com/whitesource/renovate-on-prem. 🙏

tata9001 avatar Jun 15 '21 14:06 tata9001

The fact that support for pip-compile was added is awesome!!! Still I am afraid that current assumption is not flexible enough for wide use, mainly because it assumes one input and one output and no other args. This does penalize any PEP-517 users which would have project requirements inside setup.cfg. It would also not read dependencies from setup.py which is used by lots of people.

Based on my experience with pip-compile I concluded that almost always you want to allow user to customize the entire command line, so they include the right number of input files, desired output and other args for tuning its behavior.

As a practical example on how pip-compile is used in practice look at https://github.com/ansible-community/ansible-lint/blob/master/tox.ini#L95-L96 -- as you can see the project can update all its deps using tox -e deps. pip-compile is called twice as there are two lock files to update. Note the mention of setup.py, as that is key for producing right results. Some projects also need to mention extras via --extra argument in order to include soft dependencies of current project, something that is impossible to declare inside an reqs.in file.

I guess just allowing users to define the exact command line would make implementation easier and more future proof.

ssbarnea avatar Jun 16 '21 09:06 ssbarnea

Also see our further discussion in #10407, we are thinking of better ways, there are at least 2 approaches

MrNaif2018 avatar Jun 16 '21 09:06 MrNaif2018

I've decided to reopen this issue to bring conversation back here, because I suspect I/we may need to rewrite what I did anyway.

rarkins avatar Jun 17 '21 07:06 rarkins

Just to clarify, it's common for people to use pip-compile not only for requirements.in -> requirements.txt type of conversations, but also for setup.py -> requirements.txt and setup.cfg -> requirements.txt? Any other formats?

It looks like then we have at least two implementation approaches:

  1. Keep separate "managers"/parsers in Renovate for setup.py, setup.cfg and requirements.x formats, but add awareness to each to run pip-compile where necessary. This would likely mean we need users to specify the in/out file naming (maybe with patterns) because there is no convention for the exact output file names
  2. Instead of starting with the "input" files, we detect all the output files, and then learn the input files based on the pip-compile "comments" within the file. For simplicity this would be one "manager" in Renovate, e.g. maybe we just call it pip. But we'd still need to work out what to do with the people using those formats who don't use pip-compile

I prefer the 2nd approach because it could mean it "just works" for the majority of users out of the box (e.g. we match any setup.cfg, setup.py, or requirements/*.txt type of file names by default, and then use the ones we found to locate any missing input files). However it's definitely more complex and not something we've done before.

Further clarification:

  • If we find output files with pip-compile signatures and can locate the input file in the same repo, then we treat the input file as the "package file" and the output file as the artifact/lock file
  • For any setup.cfg, setup.py or requirements.txt type of files which are not autogenerated then we treat them as a package file with no artifact/lock file

rarkins avatar Jun 17 '21 07:06 rarkins

I would personally avoid adding logic to renovate manager, it should be as stupid as possible (KISS) and rely on pip-compile to do the right things, while allowing user to fully control the entire CLI being used. The reason I suggest that is maintenance. Any feature that relies on how someone is using pip-compile or its output format is prone to break with future releases.

I would go so far to even say that probably renovate should not even know which are the inputs, only the output(s). One major usage for me is to use pip-compile to upgrade test dependencies lock files. I often do only need to run it without any change made to .in files, only to run the tool again and submit its outcome.

PS. Do not call it pip, pip-compile is not used by everyone using python packaging. I am sure there will be other alternatives in the future. Maybe a generic CLI manager approach would prove more flexible? If someone builds a new tool they could easily configure it without needing to patch renovate itself?

ssbarnea avatar Jun 17 '21 08:06 ssbarnea

Just to clarify, it's common for people to use pip-compile not only for requirements.in -> requirements.txt type of conversations, but also for setup.py -> requirements.txt and setup.cfg -> requirements.txt?

Just to have it mentioned:

We are having a requirements.in which then uses local setup.py files. Maybe this could be an initial workaround.

reqirements.in:

-e package1[test]
-e package2[test]

aberres avatar Jun 17 '21 08:06 aberres