warehouse
warehouse copied to clipboard
Support PEP 625 (Filename of a Source Distribution)
What's the problem this feature will solve? PEP 625 has been accepted, PyPI should be updated to support the PEP.
Describe the solution you'd like PyPI needs to implement some changes to support the PEP:
- a restriction on the filenames considered valid for a source distribution (and a corresponding deprecation/notification)
- validation of the distribution and version sections of the filename, including normalization.
This is likely blocked on https://github.com/pypa/packaging/issues/527.
This is probably also blocked on finding a good migration path that isn't too obtrusive for users. Right now, there are a lot of files being uploaded that would fail to upload once this change is implemented, and IMO this would break too many users for us to enable this right now:
warehouse=> select DATE_TRUNC('day', upload_time) as day, count(filename) from release_files where packagetype = 'sdist' and filename ilike '%-%-%' group by DATE_TRUNC('day', upload_time) order by day desc limit 30;
day | count
---------------------+-------
2023-06-26 00:00:00 | 1028
2023-06-25 00:00:00 | 381
2023-06-24 00:00:00 | 687
2023-06-23 00:00:00 | 1093
2023-06-22 00:00:00 | 1453
2023-06-21 00:00:00 | 1486
2023-06-20 00:00:00 | 1606
2023-06-19 00:00:00 | 1200
2023-06-18 00:00:00 | 354
2023-06-17 00:00:00 | 723
2023-06-16 00:00:00 | 1455
2023-06-15 00:00:00 | 1161
2023-06-14 00:00:00 | 1567
2023-06-13 00:00:00 | 1557
2023-06-12 00:00:00 | 1157
2023-06-11 00:00:00 | 358
2023-06-10 00:00:00 | 693
2023-06-09 00:00:00 | 1327
2023-06-08 00:00:00 | 1958
2023-06-07 00:00:00 | 1631
2023-06-06 00:00:00 | 1430
2023-06-05 00:00:00 | 1116
2023-06-04 00:00:00 | 325
2023-06-03 00:00:00 | 783
2023-06-02 00:00:00 | 1327
2023-06-01 00:00:00 | 1710
2023-05-31 00:00:00 | 1693
2023-05-30 00:00:00 | 1109
2023-05-29 00:00:00 | 959
2023-05-28 00:00:00 | 387
(30 rows)
I think a good migration path would be:
- ensuring the most popular build tools have supported outputting PEP 625-compliant filenames for some sufficiently long period of time
- perhaps making upload tools like
twinesilently normalize this at upload time, possibly with a warning?
The other thing we could do, is forcibly normalize ourselves, though we hadn't done that in the past and I know that would break at least twine's checks if a file has been uploaded already.
Blocked on #14156 as well.
Until the version in the sdist filename is verified as described in this issue, it's possible to create multiple sdists per release (as seen from the filename point of view).
Example: upload foo-1.tar.gz with release 1 and foo-1.zip with release 1.1. From the metadata point of view it's still just one sdist per release, but from the Simple API point of view (which the package managers use) there are two sdists for release 1.
This does not seem to be in the spirit of PEP 527:
[T]his PEP proposes to allow one, and only one, sdist per release of a project.
Probably also blocked on https://github.com/pypa/setuptools/issues/3593 as the predominant builder of source distributions.
There doesn't seem to be any real progress here towards builders producing normalized source distribution filenames:
warehouse=> SELECT DATE_TRUNC('month', upload_time) AS month, COUNT(filename)
FROM release_files
WHERE packagetype = 'sdist'
AND filename ILIKE '%-%-%'
AND upload_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '30 months'
GROUP BY DATE_TRUNC('month', upload_time)
ORDER BY month DESC;
month | count
---------------------+-------
2024-03-01 00:00:00 | 23683
2024-02-01 00:00:00 | 31742
2024-01-01 00:00:00 | 33589
2023-12-01 00:00:00 | 33818
2023-11-01 00:00:00 | 35584
2023-10-01 00:00:00 | 32092
2023-09-01 00:00:00 | 33117
2023-08-01 00:00:00 | 38100
2023-07-01 00:00:00 | 34178
2023-06-01 00:00:00 | 35241
2023-05-01 00:00:00 | 35136
2023-04-01 00:00:00 | 32816
2023-03-01 00:00:00 | 39726
2023-02-01 00:00:00 | 34714
2023-01-01 00:00:00 | 32340
2022-12-01 00:00:00 | 26588
2022-11-01 00:00:00 | 29160
2022-10-01 00:00:00 | 27748
2022-09-01 00:00:00 | 30693
2022-08-01 00:00:00 | 35739
2022-07-01 00:00:00 | 30297
2022-06-01 00:00:00 | 31412
2022-05-01 00:00:00 | 35092
2022-04-01 00:00:00 | 29901
2022-03-01 00:00:00 | 33199
2022-02-01 00:00:00 | 27257
2022-01-01 00:00:00 | 28129
2021-12-01 00:00:00 | 27028
2021-11-01 00:00:00 | 30112
2021-10-01 00:00:00 | 30402
2021-09-01 00:00:00 | 28612
(31 rows)
https://github.com/pypa/setuptools/issues/3593 has been closed (implemented) for a little while now, would be interesting to see if it is yet making a dent in that graph
Indeed, quite a nice drop:
At this rate, we should be low enough in another month or two to start emitting warnings about a deprecation, and probably by EOY we could fully support PEP 625.