pip-api
pip-api copied to clipboard
Preserve comments and pre-evaluated markers when parsing requirements
pip-api currently (and very reasonably) strips these as part of its parse_requirements API, removing all comments and any requirement lines whose environment markers are not satisfied.
This is the correct and ideal behavior for nearly all usecases, but https://github.com/trailofbits/pip-audit/pull/225 requires slightly different behavior from the parser: in particular, pip-audit -r ... --fix should preserve all comments in the input requirements, as well as any requirement lines (and environment markers) that weren't evaluated.
cc @tetsuo-cpp for thoughts on implementation here.
I started to write a comment here about the long-term future of this function but turned it into https://github.com/di/pip-api/issues/121 instead.
Short term, I think we should add this to the existing parsing logic here.
FWIW, I am preserving comments in this experiment at https://github.com/nexB/pip-requirements Also this does not evaluate markers during parsing so all requirements are kept (including invalid ones track as such)
@woodruffw @di
This has become more important for pip-audit since we're now inserting comments as part of our --fix logic. I'll take a look at this next.
I've been revisiting this today. I think there's a bit of complexity here and if we want to implement this, we're going to need the following changes:
- Refactor the
parse_requirementsAPI away from aDict[str, Union[Requirement, UnparsedRequirement]]style mapping and into a sequential one like:
Iterator[
Union[
Requirement,
UnparsedRequirement,
UnevaluatedRequirement,
Comment
]
]
- When a marker doesn't evaluate to true with the current environment, yield it as an
UnevaluatedRequirement. - When we spot a comment in the requirement file, yield it as a
Comment. - Find some way to represent trailing comments. For example, we want to be able to separately represent requirements like:
pip-audit==1.0 # Audits Python environments and dependency trees for known vulnerabilities
And ensure that it doesn't turn into:
pip-audit==1.0
# Audits Python environments and dependency trees for known vulnerabilities
Perhaps the solution is to add an optional trailing comment field to Requirement?
I've been looking at @pombredanne's pip-requirements-parser library for reference and I'm starting to think it may make more sense to just port pip-audit over to it until we figure out #121. It looks very well written and complete and I suspect that if we keep patching the parse_requirements API, we'll end up slowly reinventing what already exists in pip-requirements-parser. What do you think about this idea @di @woodruffw?
Given #121, I'd say that any downstream library that wants to parse requirements files and preserve comments should probably take a dependency on pip-requirements-parser instead, at least until this gets upstreamed into pypa/packaging. I agree we shouldn't keep patching parse_requirements but I also don't think we should take on the dependency for a function this library isn't planning to support long-term.
@di @tetsuo-cpp I would be honored and I look forward to help with pip-requirements-parser in any I can if you fancy using it. I shall say that your pip-audit looks positively awesome!
Also FWIW pip-requirements-parser is now used extensively for scans in scancode-toolkit and also recently in a new python-inspector library and tool for dependency resolution.
@di
Sure! I wasn't suggesting that pip-api depend on pip-requirements-parser in order to implement parse_requirements, I meant that pip-audit should take the dependency and call into pip-requirements-parser for this logic instead of pip-api. I think we're in agreement.
@pombredanne Thanks so much! I'll let you know if I run into any trouble. Great to hear that it's getting some real world usage.