Prohibit uploading malformed wheels to PyPI
Problem description
There is a widespread problem with python packages: people fail to configure setuptools build backend properly and the resulting wheels include files and dirs that must not be in them, such as tests right in the root of the wheel or legacy scripts that are not cross-platform. As long as a package somehow works for them, they stop caring about packaging them properly, even though some build backends scream about the issues, those people just don't listen.
It feels like we'd have to have some enforcement to fix the situation.
- Require uploading wheels before uploading sdists. Then prohibit uploading sdists for python-only packages. For binary packages there must be uploaded wheels for the major plaforms first (to ensure that the most of users will download wheels, not sdists, so there will be no way to cheat by uploading a binary wheel noone will download). If people need source code, they must use version control system hostings, like GitHub, sr.ht or radicle.xyz. Rationale:
a. we canot reliably detect misconfiguration on them without actual building of wheels
b. if we detect misconfiguration assumming that the wheels correspond to sdists, authors will just to work the restriction around by not uploading wheels at all, making the situation even worse, instead of fixing their packages.
c. uploading sdists-only without uploading wheels is by itself an issue. Pure python wheels often contain setup.py that is execution of code, a yet anlther place to put a backdoor. Wheels containing native code require compilation. This requires compilers, which are proprietary and and containing telemetry for some platforms (well, MinGW usually works fine, but by default pip uses VC++), or that just heavy and having the state of the art versions of which contradicts the usual security practices.
- Prohibit uploading wheels matching any of the following criteria
a. with the dir tests in the root of a wheel. It is usually there as a result of misconfiguration. This dir will land into site-packages and different packages can land conflicting set of files into this dir. It is likely this dir was nevsr intended to land into wheels, but packages where it lands into wheels are extremely widespread, and even though setuptools lists all the files landing into wheels, the ones who build wheels and upload them to pypi just don't care.
b. lacking description. Rationale: pypi is a public repository, packages there are intended to be used by the public. Packages without a proper description are not intended to be used by the public and just garbage the search results.
c. containing *.pth files. Rationale: they are executed without an explicit import of them.
d. lacking the clearly stated license. License can be clearly stated either with a license file or/and with wheel metadata. Rationale: without a license it is illegal to use it for the public. But pypi is not a personal hosting for proprietary components for own consumption.
https://github.com/jwodder/check-wheel-contents may be useful for this.
There are packages that can't build wheels (I maintain one of them). And there are also packages in the top 360 that are not currently building wheels - I'm sure trying to be heavy-handed now will not go over well. And there are packages that only support Windows, and packages that only support Linux, etc.
FYI, check-wheel-contents errors out if you have two files with the same contents. This includes __init__.py. And there's still quite a bit it doesn't catch (but it's great that it exists!)
Eliminating SDists will break almost all third-party distributors, like conda-forge, homebrew, and the linux distros. They already hate us for pushing them to stop using setup.py install. I'd like to see installer used more, but it's not going be fast. And the SDist usually contains the tests, and the wheel doesn't, so most of them will likely still want the SDist.
For the actual problem, I'd say a first step would be to improve twine's ability to detect misconfiguration, maybe even the most important things could trigger an error without --strict! Give people an easy-to-use, easy-to-find method for validating wheels, and some of them will us it.
For the actual problem, I'd say a first step would be to improve twine's ability to detect misconfiguration, maybe even the most important things could trigger an error without --strict! Give people an easy-to-use, easy-to-find method for validating wheels, and some of them will us it.
Doing the checks client side is a nice idea. But the checks should be mandatory in order to work efficiently. When building a wheel setuptools scream about tests dir with all the files with them landing into a wheel with a very visible wall of text, but lot of package authors prefer to just ignore it. I guess the plague of packages with tests dir within the root of their wheel is an evidence that package authors are lazy and will prefer not to fix their packages if they could.
The problem is: if one implements the checks client-side, they can be bypassed. For example by non-updating the tools (and since the updates of python packages are not automatic, people tend to have old versions of packages). Requiring a new version of software in order to send a wheel to pypi is not good, especially taking into account that one can take twine, patch it to bypass the checks (and maybe integrate backdoor inserting code), and create a few StackOverflow answers "How can I get rid of the error super-ultimate-twine-with-bells-and-whistles instead, it fools pypi that a package is uploaded with the latest version of twine, supports legacy python versions and has no bullshit checks."
So the client side checks are good to prevent the network interaction to the server (especially useful for traffic-limited connections) and the resource consumption on the server if the server gonna reject the package.
FYI, check-wheel-contents errors out if you have two files with the same contents. This includes init.py
2BH I have never used it yet, but I have never meant to use all its warnings, only the ones that are serious enough.
Eliminating SDists will break almost all third-party distributors, like conda-forge, homebrew, and the linux distros. They already hate us for pushing them to stop using setup.py install.
It's their problem, I guess. Of course they will hate us, for their own faults of using the incorrect approaches to packaging. If they need source code to build wheels themselves, they should use scms, not pypi: at least scm contains the code seen by "enough (well, not usually enough, but at least more than the ones looking at sdists on pypi) eyeballs". If they need to just package a binary, they should repackage a wheel with installer. Unfortunately installer currently has issues (one needs to discover the right paths), currently I have to work around them by reusing a few functions from pip, but I hope it won't take ethernity to migrate the code into installer. So about the haters: haters gonna hate.
And there are packages that only support Windows, and packages that only support Linux, etc.
Yeah, it is proposed to require uploading a wheel not for all the major platforms, but only for one of them. The goal of mandatory wheel uploading is to ensure that if the version is accepted into pypi, then a significant share of package audience will use the uploaded wheel, so if an author uploads a dummy wheel to cheat pypi, then he just makes users of that platform angry by sending them a dummy wheel instead of a working one.