pip-audit Cannot handle ">=" in requirements.txt

The vast vast majority of Python packages use >= in their requirements.txt not ==. But pip-audit is not flagging vulnerabilities when >= used.

E.g., take this example requirements.txt:

lxml>=4.5.1

lxml 4.5.1 contains a vulnerability but is not flagged by pip-audit. Only flagged if >= replaced with ==

Nov 28 '22 15:11 ValueRaider

Thanks for the report!

Could you clarify what your expected behavior is here? pip-audit doesn't actually audit version ranges like lxml>=4.5.1 -- what we do is resolve the range to the "most compatible" version, and then audit the result of that resolution.

This is very similar to what pip does. For example, with lxml>=4.5.1, we actually end up with lxml==4.9.1 installed:

$ python -m pip install 'lxml>=4.5.1'
Collecting lxml>=4.5.1
  Downloading lxml-4.9.1.tar.gz (3.4 MB)
     |████████████████████████████████| 3.4 MB 2.5 MB/s
Using legacy 'setup.py install' for lxml, since package 'wheel' is not installed.
Installing collected packages: lxml
    Running setup.py install for lxml ... done
Successfully installed lxml-4.9.1

As such, pip-audit doesn't report vulnerabilities in cases like these: we only report vulnerabilities for concrete versions, since open-ended versions are "abstract" and may or may not actually be vulnerable.

Nov 28 '22 17:11 woodruffw

Expected behaviour

List the versions in that range that contain a known vuln, or print the most recent insecure/secure version. So that I, as a package maintainer, can easily specify minimum secure versions not just minimum functional versions.

I appreciate that pip-audit cannot verify that a particular range is forever secure for all future versions. But this limitation also applies to concrete versions - lxml==4.5.1 was once considered secure in past, but future vuln discoveries invalidate that past decision. Equally then, a past audit of a version range can be later invalidated.

Implementation

It appears simple to retrieve versions via PYPI API, then just loop over them.

Nov 28 '22 18:11 ValueRaider

@di pointed out an important qualification to me: if lxml>=4.5.1 does happen to select lxml==4.5.1, then we do flag a result for it. You can confirm this for yourself by constraining the open end:

$ pip-audit -r <(echo 'lxml>=4.5.1,<4.5.2')
WARNING:cachecontrol.controller:Cache entry deserialization failed, entry ignored
Found 4 known vulnerabilities in 1 package
Name Version ID             Fix Versions
---- ------- -------------- ------------
lxml 4.5.1   PYSEC-2021-19  4.6.3
lxml 4.5.1   PYSEC-2020-62  4.6.2
lxml 4.5.1   PYSEC-2021-852 4.6.5
lxml 4.5.1   PYSEC-2022-230 4.9.1

...but I can see a case where pip-audit is run separately, while pip install is run on an already partially constructed environment. If that environment already has lxml==4.5.1 then pip-audit -r ... won't produce an error, since it'll do its own dependency resolution and select a non-vulnerable version. However pip-audit on the environment itself will produce an appropriate error, since the environment contains the vulnerable version.

So this is definitely a source of user confusion that we should improve. Some options:

When presented with a non-pinned specifier (like lxml>=4.5.1), we could audit all candidates for that specifier (4.5.1, 4.5.2, etc.). We could then issue a warning (but not necessarily a finding) for each candidate that contains a vulnerability, letting a user know that their constraints might allow a vulnerable version even if non-vulnerable versions are the only ones that actually get selected.
- Pro: More closely aligns with user expectations, as indicated by this issue.
- Pro: May help users of pip-audit discover more vulnerable dependencies in their environments, if we have a base of users who prefer to audit requirement files being installed into pre-existing environments that haven't already been audited. I'm not sure how common that is, or whether we want to encourage that (we should probably encourage people to audit everything instead?)
- Con: Complicates our internal data model a little bit: we would have to keep additional track of each resolution candidate, even the unselected ones, and produce results for each.
- Con: Potentially confusing/unintuitive implications for transitive deps: if foo>=1.0.0 has a candidate foo==1.0.1 with transitive dep bar==1.0.2, should we warn on a vulnerability in bar==1.0.2 even if the selected candidate for foo doesn't contain bar? The argument in favor is that it correctly reflects what might happen if foo>=1.0.0 is installed into a pre-existing environment; the argument against is that it might produce a lot of vulnerability reports that aren't actually relevant to how the spec actually gets installed.
Elaborate on our documentation of pip-audit's security model and expected use: we could emphasize that pip-audit only works on concrete dependencies, meaning that it doesn't prevent people from specifying vulnerable ranges that don't happen to take effect. Similarly, we could emphasize that using pip-audit in a "mixed" manner (installing into a pre-existing environment that hasn't been audited) is potentially dangerous, as pip-audit lacks the context to say whether its candidate selection matches the selection done by pip install into an extant environment.
- Pro: Avoids any significant changes to the codebase, and retains our current "concrete-only" behavior.
- Con: Effectively defines away the problem, rather than actually solving it. But maybe we can ameliorate this by issuing a warning when we see someone do pip audit -r ... in the context of an active virtual environment?
Figure out a way to feed the active environment's dependencies into the candidate selection process for pip-audit. For example, if a user is installing into an environment with lxml==4.5.1 already present, we should feed that in and then report that that's the selected candidate (and subsequently report the vulns).
- Pro: Keeps things "simple": users don't have to be aware of the difference between auditing environments versus requirements files, since pip-audit deconflicts everything internally.
- Con: Violates the (weakly) isolated nature of pip-audit -r ... -- we'd have to either copy the current environment into the isolated venv we create, or otherwise somehow feed its constraints into the candidate resolver. This shouldn't be too hard, but it might surprise some users.
- Con: Similar to the previous: this will make pip-audit -r ... non-hermetic, meaning that it may flag vulnerabilities that aren't actually present: lxml==4.5.1 might be present in some old CI environment, for example, but every actual deployment would be resolved against a patched version.

cc @di for thoughts as well 🙂

Nov 28 '22 18:11 woodruffw

@ValueRaider, what would you expect to happen with transitive dependencies? For example, if lxml was an unpinned sub-dependency of some top-level dependency you had, but wasn't in the requirements file?

Nov 28 '22 18:11 di

I suspect we are discussing 2 slightly different use cases here, which would expect slightly different behaviour:

User deploying software

Use case: can I install this package and its set of requirements without introducing vulnerabilities into my environment? I suspect this is the currently implemented use case.

Package maintainer

Use case: have I configured requirements to ensure no end-user ends up installing insecure dependencies, or any insecure dependencies they already have are updated?

I think use case 2 is important because I suspect most Python end-users don't check for module vulnerabilities.

Transitive dependencies

I think scanning sub-dependencies is useful, but I appreciate the additional complexity of implementation and presentation. "Perfection is the enemy of progress" comes to mind - simply just scanning the top-level is a great improvement over current behaviour, and if most maintainers adopted this then wouldn't need to scan entire tree.

Nov 28 '22 18:11 ValueRaider

Yeah, we sort of conflate these use cases: we expect users to understand that pip-audit -r ...'s results are correct only insofar as pip install -r ... is used on an environment that doesn't introduce further constraints. That's why we generally encourage people to audit entire environments rather than abstract dependency sources 🙂

Maybe another option here is to add an optional flag, something like --audit-candidates, which would allow a user to explicitly opt into this kind of behavior?

Nov 28 '22 19:11 woodruffw

Some extra flag makes sense.

Nov 28 '22 22:11 ValueRaider