warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Support PEP 625 (Filename of a Source Distribution)

Open di opened this issue 3 years ago • 29 comments

What's the problem this feature will solve? PEP 625 has been accepted, PyPI should be updated to support the PEP.

Describe the solution you'd like PyPI needs to implement some changes to support the PEP:

  • a restriction on the filenames considered valid for a source distribution (and a corresponding deprecation/notification)
  • validation of the distribution and version sections of the filename, including normalization.

di avatar Sep 21 '22 21:09 di

This is likely blocked on https://github.com/pypa/packaging/issues/527.

di avatar Sep 21 '22 21:09 di

This is probably also blocked on finding a good migration path that isn't too obtrusive for users. Right now, there are a lot of files being uploaded that would fail to upload once this change is implemented, and IMO this would break too many users for us to enable this right now:

warehouse=> select DATE_TRUNC('day', upload_time) as day, count(filename) from release_files where packagetype = 'sdist' and filename ilike '%-%-%' group by DATE_TRUNC('day', upload_time) order by day desc limit 30;
         day         | count
---------------------+-------
 2023-06-26 00:00:00 |  1028
 2023-06-25 00:00:00 |   381
 2023-06-24 00:00:00 |   687
 2023-06-23 00:00:00 |  1093
 2023-06-22 00:00:00 |  1453
 2023-06-21 00:00:00 |  1486
 2023-06-20 00:00:00 |  1606
 2023-06-19 00:00:00 |  1200
 2023-06-18 00:00:00 |   354
 2023-06-17 00:00:00 |   723
 2023-06-16 00:00:00 |  1455
 2023-06-15 00:00:00 |  1161
 2023-06-14 00:00:00 |  1567
 2023-06-13 00:00:00 |  1557
 2023-06-12 00:00:00 |  1157
 2023-06-11 00:00:00 |   358
 2023-06-10 00:00:00 |   693
 2023-06-09 00:00:00 |  1327
 2023-06-08 00:00:00 |  1958
 2023-06-07 00:00:00 |  1631
 2023-06-06 00:00:00 |  1430
 2023-06-05 00:00:00 |  1116
 2023-06-04 00:00:00 |   325
 2023-06-03 00:00:00 |   783
 2023-06-02 00:00:00 |  1327
 2023-06-01 00:00:00 |  1710
 2023-05-31 00:00:00 |  1693
 2023-05-30 00:00:00 |  1109
 2023-05-29 00:00:00 |   959
 2023-05-28 00:00:00 |   387
(30 rows)

I think a good migration path would be:

  • ensuring the most popular build tools have supported outputting PEP 625-compliant filenames for some sufficiently long period of time
  • perhaps making upload tools like twine silently normalize this at upload time, possibly with a warning?

di avatar Jun 26 '23 21:06 di

The other thing we could do, is forcibly normalize ourselves, though we hadn't done that in the past and I know that would break at least twine's checks if a file has been uploaded already.

dstufft avatar Jun 26 '23 21:06 dstufft

Blocked on #14156 as well.

di avatar Jul 18 '23 17:07 di

Until the version in the sdist filename is verified as described in this issue, it's possible to create multiple sdists per release (as seen from the filename point of view).

Example: upload foo-1.tar.gz with release 1 and foo-1.zip with release 1.1. From the metadata point of view it's still just one sdist per release, but from the Simple API point of view (which the package managers use) there are two sdists for release 1.

This does not seem to be in the spirit of PEP 527:

[T]his PEP proposes to allow one, and only one, sdist per release of a project.

Which is currently verified on upload.

stiankri avatar Sep 20 '23 17:09 stiankri

Probably also blocked on https://github.com/pypa/setuptools/issues/3593 as the predominant builder of source distributions.

di avatar Mar 20 '24 18:03 di

There doesn't seem to be any real progress here towards builders producing normalized source distribution filenames:

warehouse=> SELECT DATE_TRUNC('month', upload_time) AS month, COUNT(filename)
FROM release_files
WHERE packagetype = 'sdist'
    AND filename ILIKE '%-%-%'
    AND upload_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '30 months'
GROUP BY DATE_TRUNC('month', upload_time)
ORDER BY month DESC;
        month        | count
---------------------+-------
 2024-03-01 00:00:00 | 23683
 2024-02-01 00:00:00 | 31742
 2024-01-01 00:00:00 | 33589
 2023-12-01 00:00:00 | 33818
 2023-11-01 00:00:00 | 35584
 2023-10-01 00:00:00 | 32092
 2023-09-01 00:00:00 | 33117
 2023-08-01 00:00:00 | 38100
 2023-07-01 00:00:00 | 34178
 2023-06-01 00:00:00 | 35241
 2023-05-01 00:00:00 | 35136
 2023-04-01 00:00:00 | 32816
 2023-03-01 00:00:00 | 39726
 2023-02-01 00:00:00 | 34714
 2023-01-01 00:00:00 | 32340
 2022-12-01 00:00:00 | 26588
 2022-11-01 00:00:00 | 29160
 2022-10-01 00:00:00 | 27748
 2022-09-01 00:00:00 | 30693
 2022-08-01 00:00:00 | 35739
 2022-07-01 00:00:00 | 30297
 2022-06-01 00:00:00 | 31412
 2022-05-01 00:00:00 | 35092
 2022-04-01 00:00:00 | 29901
 2022-03-01 00:00:00 | 33199
 2022-02-01 00:00:00 | 27257
 2022-01-01 00:00:00 | 28129
 2021-12-01 00:00:00 | 27028
 2021-11-01 00:00:00 | 30112
 2021-10-01 00:00:00 | 30402
 2021-09-01 00:00:00 | 28612
(31 rows)

chart

di avatar Mar 20 '24 18:03 di

https://github.com/pypa/setuptools/issues/3593 has been closed (implemented) for a little while now, would be interesting to see if it is yet making a dent in that graph

dimbleby avatar Jul 27 '24 16:07 dimbleby

Indeed, quite a nice drop:

chart (3)

At this rate, we should be low enough in another month or two to start emitting warnings about a deprecation, and probably by EOY we could fully support PEP 625.

di avatar Aug 07 '24 00:08 di