Cannot handle ">=" in requirements.txt
The vast vast majority of Python packages use >= in their requirements.txt not ==. But pip-audit is not flagging vulnerabilities when >= used.
E.g., take this example requirements.txt:
lxml>=4.5.1
lxml 4.5.1 contains a vulnerability but is not flagged by pip-audit. Only flagged if >= replaced with ==
Thanks for the report!
Could you clarify what your expected behavior is here? pip-audit doesn't actually audit version ranges like lxml>=4.5.1 -- what we do is resolve the range to the "most compatible" version, and then audit the result of that resolution.
This is very similar to what pip does. For example, with lxml>=4.5.1, we actually end up with lxml==4.9.1 installed:
$ python -m pip install 'lxml>=4.5.1'
Collecting lxml>=4.5.1
Downloading lxml-4.9.1.tar.gz (3.4 MB)
|████████████████████████████████| 3.4 MB 2.5 MB/s
Using legacy 'setup.py install' for lxml, since package 'wheel' is not installed.
Installing collected packages: lxml
Running setup.py install for lxml ... done
Successfully installed lxml-4.9.1
As such, pip-audit doesn't report vulnerabilities in cases like these: we only report vulnerabilities for concrete versions, since open-ended versions are "abstract" and may or may not actually be vulnerable.
Expected behaviour
List the versions in that range that contain a known vuln, or print the most recent insecure/secure version. So that I, as a package maintainer, can easily specify minimum secure versions not just minimum functional versions.
I appreciate that pip-audit cannot verify that a particular range is forever secure for all future versions. But this limitation also applies to concrete versions - lxml==4.5.1 was once considered secure in past, but future vuln discoveries invalidate that past decision. Equally then, a past audit of a version range can be later invalidated.
Implementation
It appears simple to retrieve versions via PYPI API, then just loop over them.
@di pointed out an important qualification to me: if lxml>=4.5.1 does happen to select lxml==4.5.1, then we do flag a result for it. You can confirm this for yourself by constraining the open end:
$ pip-audit -r <(echo 'lxml>=4.5.1,<4.5.2')
WARNING:cachecontrol.controller:Cache entry deserialization failed, entry ignored
Found 4 known vulnerabilities in 1 package
Name Version ID Fix Versions
---- ------- -------------- ------------
lxml 4.5.1 PYSEC-2021-19 4.6.3
lxml 4.5.1 PYSEC-2020-62 4.6.2
lxml 4.5.1 PYSEC-2021-852 4.6.5
lxml 4.5.1 PYSEC-2022-230 4.9.1
...but I can see a case where pip-audit is run separately, while pip install is run on an already partially constructed environment. If that environment already has lxml==4.5.1 then pip-audit -r ... won't produce an error, since it'll do its own dependency resolution and select a non-vulnerable version. However pip-audit on the environment itself will produce an appropriate error, since the environment contains the vulnerable version.
So this is definitely a source of user confusion that we should improve. Some options:
- When presented with a non-pinned specifier (like
lxml>=4.5.1), we could audit all candidates for that specifier (4.5.1,4.5.2, etc.). We could then issue a warning (but not necessarily a finding) for each candidate that contains a vulnerability, letting a user know that their constraints might allow a vulnerable version even if non-vulnerable versions are the only ones that actually get selected.- Pro: More closely aligns with user expectations, as indicated by this issue.
-
Pro: May help users of
pip-auditdiscover more vulnerable dependencies in their environments, if we have a base of users who prefer to audit requirement files being installed into pre-existing environments that haven't already been audited. I'm not sure how common that is, or whether we want to encourage that (we should probably encourage people to audit everything instead?) - Con: Complicates our internal data model a little bit: we would have to keep additional track of each resolution candidate, even the unselected ones, and produce results for each.
-
Con: Potentially confusing/unintuitive implications for transitive deps: if
foo>=1.0.0has a candidatefoo==1.0.1with transitive depbar==1.0.2, should we warn on a vulnerability inbar==1.0.2even if the selected candidate forfoodoesn't containbar? The argument in favor is that it correctly reflects what might happen iffoo>=1.0.0is installed into a pre-existing environment; the argument against is that it might produce a lot of vulnerability reports that aren't actually relevant to how the spec actually gets installed.
- Elaborate on our documentation of
pip-audit's security model and expected use: we could emphasize thatpip-auditonly works on concrete dependencies, meaning that it doesn't prevent people from specifying vulnerable ranges that don't happen to take effect. Similarly, we could emphasize that usingpip-auditin a "mixed" manner (installing into a pre-existing environment that hasn't been audited) is potentially dangerous, aspip-auditlacks the context to say whether its candidate selection matches the selection done bypip installinto an extant environment.- Pro: Avoids any significant changes to the codebase, and retains our current "concrete-only" behavior.
-
Con: Effectively defines away the problem, rather than actually solving it. But maybe we can ameliorate this by issuing a warning when we see someone do
pip audit -r ...in the context of an active virtual environment?
- Figure out a way to feed the active environment's dependencies into the candidate selection process for
pip-audit. For example, if a user is installing into an environment withlxml==4.5.1already present, we should feed that in and then report that that's the selected candidate (and subsequently report the vulns).-
Pro: Keeps things "simple": users don't have to be aware of the difference between auditing environments versus requirements files, since
pip-auditdeconflicts everything internally. -
Con: Violates the (weakly) isolated nature of
pip-audit -r ...-- we'd have to either copy the current environment into the isolated venv we create, or otherwise somehow feed its constraints into the candidate resolver. This shouldn't be too hard, but it might surprise some users. -
Con: Similar to the previous: this will make
pip-audit -r ...non-hermetic, meaning that it may flag vulnerabilities that aren't actually present:lxml==4.5.1might be present in some old CI environment, for example, but every actual deployment would be resolved against a patched version.
-
Pro: Keeps things "simple": users don't have to be aware of the difference between auditing environments versus requirements files, since
cc @di for thoughts as well 🙂
@ValueRaider, what would you expect to happen with transitive dependencies? For example, if lxml was an unpinned sub-dependency of some top-level dependency you had, but wasn't in the requirements file?
I suspect we are discussing 2 slightly different use cases here, which would expect slightly different behaviour:
- User deploying software
- Use case: can I install this package and its set of requirements without introducing vulnerabilities into my environment? I suspect this is the currently implemented use case.
- Package maintainer
- Use case: have I configured
requirementsto ensure no end-user ends up installing insecure dependencies, or any insecure dependencies they already have are updated?
I think use case 2 is important because I suspect most Python end-users don't check for module vulnerabilities.
Transitive dependencies
I think scanning sub-dependencies is useful, but I appreciate the additional complexity of implementation and presentation. "Perfection is the enemy of progress" comes to mind - simply just scanning the top-level is a great improvement over current behaviour, and if most maintainers adopted this then wouldn't need to scan entire tree.
Yeah, we sort of conflate these use cases: we expect users to understand that pip-audit -r ...'s results are correct only insofar as pip install -r ... is used on an environment that doesn't introduce further constraints. That's why we generally encourage people to audit entire environments rather than abstract dependency sources 🙂
Maybe another option here is to add an optional flag, something like --audit-candidates, which would allow a user to explicitly opt into this kind of behavior?
Some extra flag makes sense.