deptry
deptry copied to clipboard
feat: use `packaging` to parse requirements
PR Checklist
- [x] A description of the changes is added to the description of this PR.
- [ ] If there is a related issue, make sure it is linked to this PR.
- [x] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added or modified a feature, documentation in
docsis updated
Description of changes
This is something that has been on my mind for quite some time now.
We currently rely on several regexes to parse dependencies in requirements files. Although this allows parsing formats that pip handles, there are many formats that PEP 508 does not cover, as both remote dependencies and local dependencies need to follow <package> @ <path> format. Even pip documentation suggests to use PEP 508 format.
The usage of regexes itself definitely makes the parsing best-effort, but it could also creates some false positives, as for instance for what looks like git URLs, we try to guess where the package name is, based on the git project name in the URL, which could depend on the git server used, or, worse, the git project name could be different than the real Python package name.
This PR suggests using packaging, maintained by PyPA, to parse dependencies where we expect PEP 508 format to be used (requirements files, PEP 621 metadata). This would remove support for URLs that do not follow PEP 508 dependencies, so this is a breaking change we would have to mention in the changelog, if we effectively want to go this way.
Codecov Report
Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
Project coverage is 93.1%. Comparing base (
0f0a1c6) to head (9f0bb47). Report is 173 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| python/deptry/dependency_getter/pep_621.py | 87.5% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #735 +/- ##
=======================================
+ Coverage 92.8% 93.1% +0.3%
=======================================
Files 35 35
Lines 920 888 -32
Branches 165 154 -11
=======================================
- Hits 854 827 -27
+ Misses 52 49 -3
+ Partials 14 12 -2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I do like the idea of using packaging to extract the dependencies, instead of using our own regexes, I think that is an improvement. As I understand; the only breaking change that we are aware of is for parsing requirements in requirements.txt in one of the following forms, right?
https://github.com/urllib3/urllib3/archive/refs/tags/1.26.8.zip
git+https://github.com/baz/foo-bar.git@asd#egg=foo-bar
In the link you share to the pip documentation, they suggest use PEP 508 format for installing from a package index. But they also show that they support other formats for packages that do not come from a package index. So I do think it would be good to keep supporting the formats in requirements.txt that we currently support, to reduce the risk of a breaking change.
Can we maybe do both for requirements.txt files? First try to extract the dependency with packaging, and if that fails, use a regex to extract the URL? Or maybe we can use something different completely? e.g. https://pypi.org/project/requirements-parser/
Can we maybe do both for
requirements.txtfiles? First try to extract the dependency withpackaging, and if that fails, use a regex to extract the URL? Or maybe we can use something different completely? e.g. https://pypi.org/project/requirements-parser/
Between the 2 options, I'd personally prefer the first one, as packaging will not only be used to parse dependencies in requirements.txt files, but also in other formats that support PEP 508 (for instance [project.dependencies] in pyproject.toml.
I still think though that trying to guess the package name from a random URL on which we have no real control on feels quite hacky, even if most of the time this should give the user the expected result.
I'll put back the PR as a draft for now until I find the time to get back to this.