pip-tools icon indicating copy to clipboard operation
pip-tools copied to clipboard

Add compliation check mechanism

Open PeterJCLaw opened this issue 5 years ago • 18 comments

I'd like to be able to run a command in CI to ensure that my requirements.in produces my requirements.txt, mostly as a guard against accidental edits to the latter which get undone when pip-compile is next run.

I'm currently using pip-compile --dry-run with some bodging of the output then run through a diff against my requirements.txt to enable this, however it's (predictably) not exactly reliable (sometimes I get warning text in the output which wouldn't be in the requirements file).

I can see a couple of solutions:

  • make pip-compile output the requirements.txt to STDOUT, rather than STDERR, so that there's a stream which I can use which is sure to be free of other text
  • make an actual check mode

The latter feels like the much better solution, though it seems likely to be more effort. I've no idea what the former would break though. (pip-compile seems not to use STDOUT at all?)

I'm using Python 3.5 and pip-tools 4.1.0.

PeterJCLaw avatar Aug 31 '19 17:08 PeterJCLaw

Hello @PeterJCLaw,

Thanks for bringing this up. I'd prefer to do it in a way of "Single Responsibility Principle":

$ pip-compile --quiet && git diff --exit-code

If check is failed you can see the changes of requirements.txt.

atugushev avatar Sep 02 '19 03:09 atugushev

That's definitely nicer than my previous approach, though it does assume that the requirements file is under version control (I agree this is likely), which feels like an odd restriction, and that no other changes have been made to the working area.

As far as I know in pip-compile can only operate on an output file which is the same as it's reference/lock file (I'm not sure what term you use for that). If there were a way to have a different input file than the output that might help here. (This is kinda where I was going with suggesting the change to --dry-run's output). That said, I'm not actually sure that this is a good idea either (and introducing it just for this case feels a little contrived).

I can see that it would be nice to defer the comparison part to another tool, however one could argue that the --dry-run behaviour is already headed down this path -- why would you need --dry-run when you can can just run pip-compile and then reset the file from version control if you don't want the changes? (There's even an argument that that's preferable to --dry-run because you can easily diff the file)

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

PeterJCLaw avatar Sep 02 '19 19:09 PeterJCLaw

@PeterJCLaw

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

I'm glad to help you. Anyways, i like the idea of the optional check mode, so let's leave this issue open and see whether the community is interested in this feature.

atugushev avatar Sep 02 '19 21:09 atugushev

I use tox to run my autoformatters (and pip-compile), then linters and tests. In local development that means "fix everything the check it", but in CI it's followed by git diff --exit-code to check that whatever was committed is good.

So count me as a vote for single responsibilities!

Zac-HD avatar Oct 04 '19 03:10 Zac-HD

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

It seems I spoke too soon. While this works for the case I was looking at when I reported this issue (in an open source repo of a small project), it doesn't in a case I've been looking at at work.

For the build of one of our (fairly large) internal repos we have a multi-stage CI pipeline which caches the source files into the job workspace after the initial git checkout for speed, but doesn't persist the repo as part of the cache. As a result when we come to run this check, there isn't a git repo to use for diffing.

I could probably copy the requirements.txt out the way first, run a non-dry-run and then diff the copy against the new file, but that's definitely starting to feel like I'm bodging things again.

It would be great if there was a way to validate a pair of requirements files without relying on a version control repo.

PeterJCLaw avatar Oct 10 '19 21:10 PeterJCLaw

There's another way — use pre-commit, see #974.

atugushev avatar Oct 30 '19 15:10 atugushev

Also related to the PR #1070 (which is currently under discussion) with the pip-check mechanism.

atugushev avatar Apr 16 '20 08:04 atugushev

While this doesn't help in the case where the needed git data isn't present, I'll point out that you can pass specific files, like

$ git diff --exit-code -- requirements.txt

so the repo doesn't need to be totally pristine.

When you can't count on git, an alternative comparison method is to create a hash beforehand and verify it after:

$ md5sum requirements.txt >requirements.txt.md5
$ pip-compile
$ md5sum -c requirements.txt.md5

AndydeCleyre avatar Feb 18 '22 03:02 AndydeCleyre

There is a verify command in pip-compile-multi (a tool based on pip-compile) which works similarly to @AndydeCleyre 's suggestion with MD5, however it's based on the hash of the input file.

pip-compile-multi adds an additional top line to the lockfile with the SHA1 of the orignal .in file, and pip-compile-multi verify simply checks that SHA1.

I think adding similar functionality to pip-compile could be backward compatible and useful too.

Edit: I just realized there are complications around handling requirements files that import other requirements files, because the imported files might change, so they will need to taken into consideration as well.

georgipopovhs avatar May 05 '22 15:05 georgipopovhs

Also, if an already pinned version gets revoked from the repository, the same input would generate new output. I'm guessing we'd want the check to "fail" in this case, rather than just confirming that the input is unchanged?

AndydeCleyre avatar May 05 '22 16:05 AndydeCleyre

Also, if an already pinned version gets revoked from the repository, the same input would generate new output. I'm guessing we'd want the check to "fail" in this case, rather than just confirming that the input is unchanged?

Good point, if the version is revoked the previous output file would be unusable.

georgipopovhs avatar May 09 '22 07:05 georgipopovhs

For anyone who comes across this issue, I came up with a solution that doesn't have much to do with resolving pip dependencies. I just hash both the input and output files and store the hashes in version control. This way if either of them changes, I will be able to tell in CI by comparing the hashes to the files.

def get_file_hash(file: str) -> str:
    with open(file) as f:
        s = '\n'.join(line.rstrip() for line in f)
    return hashlib.md5(s.encode()).hexdigest()

The rstrip is mostly because of different line endings in different OS's. I want to make sure the hash is the same wherever the repo is cloned.

The full script can be found here.

MichaelKim0407 avatar Mar 16 '23 19:03 MichaelKim0407

I'd like the hashing idea to be supported in pip-compile natively. That's because we use pyproject.toml for our dependencies, and I don't want to hash all of pyproject.toml. It would be ideal if pip-compile could extract dependency specifications from the inputs, subject to --extras, constraints, etc, hash ONLY those, and write the hash into the compiled requirements.txt as a comment, near the This file is autogenerated by pip-compile line. This would allow checking, and fast re-runs if nothing has changed.

lordmauve avatar Mar 28 '24 10:03 lordmauve