Package names and module names aren't always consistent
Generally, the package name and module name are consistent, eg. pip install requests, import requests.
However, sometimes they are not, eg. pip install beautifulsoup4 but import bs4. Also, pip install pycrypto but import Crypto. When this happens, it's very confusing and takes guesswork / time to look up the manual.
Obviously, these projects are to blame for using different names. But taken across the Python package index, it's frustrating that the rule "pip install xxx then import xxx" works 90% of the time, but not all the time.
PyPI could help users by clearly stating both the pip install command and the import statement on each package page. For example, https://pypi.python.org/pypi/beautifulsoup4/4.3.2 would show pip install beautifulsoup4 and import bs4.
Until we have metadata 2.0, PyPI doesn't actually know the import names, only the installation names.
So now we have the metadata 2.1 PEP accepted - if you could talk a little about what would need to happen next here @ncoghlan I would appreciate it.
Metadata 2.1 doesn't help with this problem - we descoped 2.1 significantly relative to the never-released 2.0 to only cover fields that had already been defined and had folks looking to implement them in the various publishing tools.
Given the changes in metadata 2.1, adding a field along these lines to a future metadata version would likely take the form of proposing a Provides-Import field (distinct from the existing Provides-Dist field), which would allow publishing tools to inspect the added files, and work out the expected resulting importable modules.
the rule "pip install xxx then import xxx" works 90% of the time
Did someone analyse the actual data to see if this number holds?
It needs to be said that this is a feature of the packaging toolkit that a given distribution can instal 0 or more modules, 0 or more packages, 0 or more scripts (and we won’t mention data files which have a fundamental non-portability problem). A project name doesn’t have to match module or package names.
It can be useful, but it's also confusing and can cause problems, and not just for discoverability.
If two packages have different names, but provide the same import name, then they can't be installed at the same time, but there's no way for the packaging tools to detect this. An example of this situation is iris -- it looks like this has been largely sorted out now, but there was a problem where multiple libraries were claiming import iris, and ended up using different pypi names while continuing to use import iris, so using different pypi names didn't really solve the problem fully.
If a popular library ships an import name that doesn't match the distribution name, then this is a very easy target for package take-over attacks: e.g. the pylab package at least at one point was widely recommended in beginner tutorials in data science, and it comes from pip install matplotlib. Currently pip install pylab fails, but if someone claimed that name on pypi it could start succeeding. Currently pip install crypto succeeds, and gives you an import crypto package, but it's not pycrypto.
OTOH sometimes this is what you want, e.g. pip install pillow taking over import PIL.
I don't think we can just blanket disallow this or anything, it would be too disruptive; but it seems likely that there are things we could do to handle it better.
Right, and I think a metadata 2.2 proposal focused on improved package migration management tools would be a good thing:
- adding
Provides-Import(and updatingsetuptoolsto set it automatically based on the package contents) - adding
Obsoleted-By-Dist(as proposed in earlier drafts of PEP 426) to allow project maintainers to indicate when a particular package is no longer receiving updates, and suggest potential alternatives
(I don't see any problem with either of those ideas conceptually - the challenge will just be the usual one of finding folks with the time and inclination to properly specify them, and to implement support for them in at least setuptools, and potentially in other publication tools)
Isn't part of that information already provided by the top-level.txt metadata? And how would that be handled anyway? For example, you can upgrade a system package by installing a newer version in your user site-packages.
top-level.txt isn't a standardised file, so PyPI and installers don't include any code to read it.
And the only impact it would have is to allow installers to warn about problems at install time (which would actually be an argument for having installers handle the inference from the included file paths back to importable package names, rather than trusting package uploaders to do it reliably).
Just to be clear, I think the real problem is how pip/setuptools/easy_install happily overwrite existing files during installation. If package foo provides the import bar, it's IMHO perfectly valid to be able to install an alternative implementation provided by the package my_foo in my user site-packages. The problem is when both foo and my_foo are installed in the same location.
I suppose that standardizing top-level.txt could be the mechanism for adding that information to the standard metadata.
That's essentially what we did for entry points: https://packaging.python.org/specifications/entry-points/
The main alternative would be for PyPI and installers to infer the included modules from the file listing in wheel files. The big advantage of doing that over waiting for a metadata format update is that it will work retroactively for already published packages, regardless of which publishing toolchain they use.
It also would avoid having to answer the question "what if the metadata and the data disagree?"
Yeah. That way, we'd only need to consider offering a metadata field if we wanted to help handle cases where file based module name inference was likely to miss things (e.g. a module that aliases itself to a different name on initial import, or uses a metapath hook to make otherwise unimportable file extensions importable).
Oh, fair enough. Maybe "the union of what we see in the wheel + what's listed in top-level.txt"