packaging-problems icon indicating copy to clipboard operation
packaging-problems copied to clipboard

Publishing a package is hard. New command 'pip publish'?

Open hickford opened this issue 11 years ago • 105 comments

Even after you've written a setup.py, publishing a package to PyPI is hard. Certainly I found it confusing the first time.

The sequence of steps is complex and made stressful by all the decisions left to the user:

  • Register on PyPI (on the website or with setup.py register?)
  • Login to PyPI (with setup.py register or by writing a .pypirc?)
  • Build distributions (source? egg? windows installers? wheel?)
  • Upload distributions (with setup.py upload or with twine?)

It would be neat to have a single command pip publish analogous to npm publish that did all of this, correctly.

It would build whichever distributions are deemed fashionable (source + wheel). It you weren't logged in it would automatically run the wizard pip register.

hickford avatar Jan 01 '15 14:01 hickford

Huge :+1: for this.

Related, there's a pip-init command (analogous to npm init). Would be great to have a similarly easy experience with publishing!

(See this thread for details of why I came looking for this)

beaugunderson avatar Apr 15 '15 23:04 beaugunderson

One of the super unobvious things that I ran into wrt passwords and .pypirc is that if you have tokens in there like the { and } you need to duplicate them, otherwise it tends to try and think they're string format interpolation tokens and it blows up in interesting ways.

daenney avatar Apr 17 '15 08:04 daenney

Oh, that is very non-obvious! The thing I was publishing when I was thinking about how hard all of this was actually a tiny wrapper around ConfigParser that disables interpolation and implements some other niceties; I didn't have a good example of how interpolation can blow up unexpectedly so thank you for that @daenney!

beaugunderson avatar Apr 17 '15 18:04 beaugunderson

Since nobody's said it yet - always, always publish with twine. setup.py upload uploads via plain text, exposing all your users to unnecessary risk.

glyph avatar May 28 '15 00:05 glyph

That's great advice which I didn't know. For the lazy people, https://pypi.python.org/pypi/twine

ekohl avatar May 28 '15 08:05 ekohl

FTR Distutils in Python 2.7.7+ uploads using HTTPS.

merwok avatar May 28 '15 18:05 merwok

Not verified HTTPS, that requires 2.7.9+ (I think it'll use verified then, though I haven't made sure of it).

dstufft avatar May 28 '15 18:05 dstufft

Also 3.4.3+ for Verified TLS on 3.x.

But that doesn't matter a whole lot, because of the design of distutils means that if you want to say, upload wheel files for 2.6, 3.2, or 3.3 then you have to upload with a version of Python that doesn't verify HTTPS.

dstufft avatar May 28 '15 18:05 dstufft

@dstufft thanks Donald for explaining, I hadn't appreciated why part of the solution was to create a new tool, as well as fix the bug. Cool. I think this is all the more reason for a friendly and reliable pip publish command, that would be useful (and receive security updates) to all Python versions

hickford avatar May 30 '15 18:05 hickford

I'm 100% all for this. Python is such a beautiful language, and its use cases range from simple "proof of concept" scripts to proprietary codebases. Yet it seems to be fading in popularity when compared with the Node community. I believe npm is a primary reason Node is so popular. The numbing simplicity of creating and publishing a Node package to the internet drastically lowers the bar for innovation, allowing more people to express their ideas and contributions.

We need something like this for Python. What can I do to help?

jacobbridges avatar Jun 19 '15 01:06 jacobbridges

I just realized that the only thing I've said so far on this issue is "use twine". What I should really say is: Yes. pip publish is the right solution to this problem.

glyph avatar Jul 13 '15 06:07 glyph

been a while - as someone coming to pip from the Node.js + npm world where npm publish is pretty much all you need (on first run, it asks you to set up creds if there aren't any), is there any chance to revive this effort?

Pomax avatar Apr 05 '17 19:04 Pomax

+1

axsaucedo avatar Jun 23 '17 16:06 axsaucedo

Also note that coming from the Node world, there is npm init, which generates the metadata file without needing to write it yourself. A pip revision that takes its inspiration from npm and yarn in terms of ease of use would be super great.

Pomax avatar Jun 23 '17 16:06 Pomax

I have been discovering the "python packaging world" the hard way in the last two months.

Frankly I am quite disappointed as a Python teacher in how releasing your code is confusing and not straight as with all the other Python situations.

So a huge +1 for a pip publish and please tell us how to help!

pdonorio avatar Jul 05 '17 19:07 pdonorio

I am looking forward to this improvement.

ihavenonickname avatar Jan 12 '18 12:01 ihavenonickname

I still think tightly coupling the preferred publishing tool with the preferred package consumption tool again would be a major mistake, as the entire point of much of our work in PyPA has been to break the previous tight coupling of publishing tools and compatible download tools established by distutils and setuptools/easy_install. (Keep in mind that the tight coupling favoured by commercial publishing platform operators doesn't apply to the PSF or PyPA)

While twine itself is mainly a workaround for the limitations of the upload command in setuptools/distutils, there are also projects like @takluyver's flit or @ofek's hatch, which have their own flit publish and hatch release commands, such that introducing a new pip publish command would potentially create confusion around "What's the difference between publishing with pip and publishing with my publishing tool's publishing command?".

pip is an installer, let's keep it as an installer, and not try to morph it into an ever growing "all that and the kitchen sink" project manager. We've already seen in our attempts to incorporate virtual environment management into the packaging.python.org tutorials that doing so can significantly increase the learning curve for folks using Python for ad hoc personal scripting, and publishing capabilities fall into a similar category where they're only useful to folks writing software for publication, and completely irrelevant to folks that are only interested in local personal automation.

ncoghlan avatar Feb 28 '18 03:02 ncoghlan

I think there's a big difference between putting virtualenv into tutorials, and putting uploading into pip. With virtualenv, you're adding a new tool and set of cognitive complexity to the first workflow people read about. With pip publish, we wouldn't mention it in the intro tutorial at all, and when people do get around to reading the tutorial on publishing their own packages, it would let us remove a tool and its associated cognitive complexity. "Oh, I just use pip, I already know pip."

It's really important that we've separated installing+uploading from building. It's not clear to me what the value is in separating installing from uploading. (Besides "we've always done it that way.") What's the value add in having an ecosystem of competing upload tools? Can't we have one obvious way to upload a wheel? Is there any reason flit publish exists, except to let people skip having to learn about and install yet another tool just for this? (These are genuine questions. I guess @takluyver is the one who can answer the last.)

njsmith avatar Feb 28 '18 05:02 njsmith

Is there any reason flit publish exists, except to let people skip having to learn about and install yet another tool just for this?

That's definitely part of it. If someone has prepared a package using flit, I don't want to make them learn about twine to get it published.

There's also a difference in approach, though. I think that integrating build+upload into one command reduces the risk of mistakes where you upload the wrong files - for instance, if you make a last-minute change to the code and forget to rebuild the distributions. Other people would rather separate the steps so they can test the built distributions before uploading those precise files.

pip is an installer, let's keep it as an installer, and not try to morph it into an ever growing "all that and the kitchen sink" project manager.

I guess that the push for features like this and pip init are inspired by tools like cargo, which is one tool that can do ~everything you need to manage a typical rust project - from starting a new project to running tests to publishing a crate.

I admire how this helps make rust approachable, and I think we should keep 'tool overload' in mind when designing packaging tools and documentation for Python (*). But standardising and unifying a collection of different tools which people already use is a much bigger task than designing a unified tool on a blank canvas. I don't want to say it's impossible, and give up on a potentially valuable endeavour before it is begun, but I would expect it to take many years of wrangling with people who, if they aren't happy, can easily walk away and keep using existing tools.

(* It is of course a bit hypocritical for me to talk about tool overload after adding a new tool which serves the same purpose as existing tools.)

takluyver avatar Feb 28 '18 09:02 takluyver

The fact that pip wheel exists makes this a grey area - in the sense that if there were no pip wheel, it would be obvious (to me, at least) that we shouldn't have pip publish. But pip wheel is (IMO) targeted at allowing users to ensure that they can build the same wheel that pip uses, rather than being particularly targeted at a developer workflow (although it's obviously an option in that case - but questions like reusing build artifacts have very different answers depending on whether pip wheel is for an end user workflow or a developer workflow).

Personally, I do not think pip should try to cover the development workflow. Specifically, I'm against adding pip publish.

As well as pip wheel, we currently have some "convenience" features that support the development workflow (notably editable installs). But editable installs cause a lot of support issues, because they sit somewhat uncomfortably with pip's other functionality. To be honest, if we wanted to make an even clearer split between end user and developer toolsets, I'd be OK with moving editable installs out of core pip as well (but that's a completely separate discussion, and not one I think needs raising at the moment).

(I've just seen @takluyver's comment - basically I agree with pretty much everything he said).

pfmoore avatar Feb 28 '18 10:02 pfmoore

Oh, pip definitely shouldn't try to cover the development workflow: that would require implementing the functionality of tox, pipenv, flake8, pytest, flit, setuptools, and probably a bunch more I'm forgetting :-). Development is a complex activity that of course will require a wide variety of tools.

But none of this gives me any sense of why pip publish is a bad idea. Pip is already the tool I use to talk to pypi, to build things, and (once PEP 516 lands) to wrap build backends in a uniform interface. The point of pip publish would be to talk to pypi, and maybe wrap build backends and build things. So it feels very natural that pip might be the tool that covers these features.

Again, how does the pip/twine separation benefit users? Are there so many different ways to upload something to pypi that users benefit from a variety of options?

njsmith avatar Feb 28 '18 12:02 njsmith

Again, how does the pip/twine separation benefit users? Are there so many different ways to upload something to pypi that users benefit from a variety of options?

FTR when I originally wrote twine, I intended to put it into pip at some point, I just wanted to bake it externally first. Although the flip side of that is that something like twine wheel would theoretically only build the current wheel (and likely wouldn't install any dependencies at all, assuming you've already set up the environment) and twine upload would then be the pair for that.

IOW, I can see it going either way really.

dstufft avatar Feb 28 '18 13:02 dstufft

OK. I guess it depends on how you look at the tools. I interact with pip as an end user. I use it to install, to uninstall/upgrade, sometimes to generate wheels for offline use (and for that I generally don't need to worry about whether that needs a build step or just downloads an existing wheel). I don't see it as a development tool (even though pip wheel is a convenient build interface). My build workflow typically doesn't involve editable installs so I don't have a view on that. And most of my publishing is internal, so I don't use pip (or twine!) there either.

I'd be happy to see a unified "interact with PyPI and build systems" tool for development, as it would make it easier for me to get going when I do need to use a PyPI-based development workflow. But I don't really care if that's pip or not.

Again, how does the pip/twine separation benefit users?

Well, users who aren't publishers don't have to deal with documentation that discusses publishing workflows, credential management, etc. Non-developer users of pip don't have to deal with the additional security risks in an upload code path (note - this is unverified speculation, I have no idea if a publish command would significantly increase the risk of vulnerabilities in pip).

Basically, it benefits non-publisher users by giving them a tool focused to their needs, and separating publisher functionality into another tool they can ignore. There's a cost to publishers from this choice, but we have far more non-publishers using pip than publishers. And publishers can be assumed to be technically more experienced, and hence more able to deal with a slightly more complex set of tools.

Are there so many different ways to upload something to pypi that users benefit from a variety of options?

I'm not suggesting competition or a variety of ways, just that the "one way to publish" doesn't need to be the same as the "one way to install". And it isn't at the moment, so it seems to me that the question is what benefit do users get from merging pip and twine? (Given that Python does not have the history of "one tool to rule them all" that npm and cargo offer to Javascript and Rust developers respectively).

Also, in technical terms, there are things like credential management that would be new for pip. And in practice, the upload interface to PyPI is independent of the download interface. So there's not much synergy there. My instinct is that there wouldn't be much technical gain from publish being a pip command rather than in another tool. Add to that the fact that pip is critically short of manpower anyway, and maybe a separate tool bypasses that problem, too...

pfmoore avatar Feb 28 '18 13:02 pfmoore

There is also https://github.com/zestsoftware/zest.releaser for the problem scope.

Rotonen avatar Feb 28 '18 14:02 Rotonen

I'm involved in a lot of packaging for compiled packages (numpy, scipy, matplotlib ...). The process generally looks like this:

  • make a wheel building repo, and make it run on travis-ci (manylinux, macOS), and appveyor (Windows). The wheel-building repo uploads wheels to a public site, in most cases, a Rackspace container.
  • at release time : trigger build from wheel-building repo. Wait for builds to finish on travis-ci and appveyor. Check they all built correctly. Wait for the wheels to arrive on public site.
  • Use a script to download wheels from public site, sign them, and upload to pypi using twine.

We generally want to upload the wheels before the source, to make sure that users don't suddenly start getting source installs in the period between uploading source and uploading wheels.

I can't see that process being easy to automate with a single command - pip publish or otherwise. There are too many moving pieces. Echoing Thomas' comment, it is not even desirable, because the wheels may not build correctly, and so the extra step between build and upload is useful for review and fixes.

matthew-brett avatar Feb 28 '18 14:02 matthew-brett

@pfmoore nicely summarised my view on this as well: right now, pip is a tool for anonymous interaction with PyPI, and it serves that role nicely. You can use it just fine without a PyPI configuration file or any form of credential management, since it never attempts to do anything that requires authentication or authorisation.

For someone making the transition from PyPI user to PyPI publisher, there isn't going to be a big difference in complexity between a tutorial that tells someone to start with pip install flit and then goes through a series of instructions culminating in flit publish and a slight variant that culminates in pip publish instead (if anything, the latter may be slightly more confusing, on the grounds of "Why am I switching back to pip for the last step?").

For someone that doesn't want to make that transition (at least, not yet), then having pip publish show up in pip --help is an irrelevant distraction at best, and potentially confusing and offputting at worst.

(To answer Paul's question regarding the security implications, having Python available on a system at all is already such a nightmare when it comes to making data exfiltration easier that having pip publish also available probably wouldn't make a lot of difference in practice. That said, maintaining a deliberately smaller attack surface for an installed-by-default component is pretty much never going to be a bad thing)

ncoghlan avatar Feb 28 '18 14:02 ncoghlan

I will say one of most common complaints I see amongst people who are new to Python (either new to programming in general, or new to Python but used to other languages) is how fractured our packaging ecosystem is, and how working with it involves interacting with 12 different tools. It's a common point of frustration I see expressed on Twitter and Reddit and the like. With that, I'm still on the fence!

Some random points:

  • I don't think a hypothetical pip publish would work differently than twine does now. It wouldn't be building wheels or anything like that, it'd just take built wheels and sdists and upload them.
  • I could see both pip and twine having similar but different commands, for instance I could see pip wheel and twine wheel, with the former building wheels for the entire dependency graph (and installing build deps etc) and the latter being more of a stand in for setup.py bdist_wheel using the new API that doesn't build it for the entire dep graph or install dependencies. In this vein separate tools would let us optimize the workflow for one or another.
  • Having to pip install twine is kind of annoying vs baking it into pip.
  • I think having every build tool offer it's own publish it's the worst case scenario.

dstufft avatar Feb 28 '18 14:02 dstufft

I prefer the term "publishing tool" to "build tool", because builds happen both pre- and post-publication, which means builds are always going to be driven by the standard formats (either the setup.py de facto standard, or the documented pyproject.toml build hooks), no matter when those builds happen. There's also quite a bit more that goes into publishing a project than just specifying how to build it.

setuptools(/distutils) is likely to remain the most popular publishing tool for existing projects for a long time, as it's typically much harder to justify the effort of migrating an existing project over to a new publishing toolchain than it is to justify choosing a newer toolchain with more opinionated defaults for a new project.

That popularity and entrenchment is why so much of PyPA's focus has been on decoupling the popularity of specific publishing tools from our recommendations for preferred installation tools: the factors affecting their adoption are very different, and now that we've successfully decoupled them, we can let our recommendations for new users and our recommendations for new publishers evolve independently of each other. (This is in fairly significant contrast to younger ecosystems that are building their user and publisher audiences concurrently, and need a much higher conversion rate from user to publisher in order to help fully establish themselves)

While it's true that when it comes to developing Python projects, pip install setuptools wheel twine certainly isn't the nicest way to start, it isn't the twine part that makes that especially confusing: it's that setuptools requires you to write setup.py directly, without providing reasonable defaults.

Commands like flit init and hatch new have learned from that, and provide much better default starting points than setuptools or distutils ever did. (And thanks to PyPA's decoupling work, publishers are free to switch toolchains between setuptools and hatch or flit and hatch without their end users needing to care in the slightest).

As a separate component, twine is useful not only as a more flexible replacement for setup.py sdist bdist_wheel upload, but also as a dependency for other publishing tools (e.g. hatch release uses twine to handle the actual uploads to PyPI. I don't know if flit does something similar, or includes its own upload code, but whether it does or not is completely transparent to me as a flit user).

So I'm completely in favour of switching out the packaging.python.org publishing tutorial with one that recommends a more modern and comprehensive publishing toolchain (with flit being the current leading contender for that role, since it strikes a sweet spot of being "the simplest thing that could possibly work" for single file pure Python projects - I'm not sure if he's actually started drafting a version of it, but @jonparrott has been considering that possibility for a while now).

I'm also in favour of encouraging publishing tool developers to rely on twine when it makes sense for them to do so.

The only part that doesn't make sense to me is "Let's make pip a publishing tool", since it would be a major increase in scope for pip, and I don't think it would solve any problem that we actually have.

ncoghlan avatar Feb 28 '18 14:02 ncoghlan

It isn't the twine part that makes that especially confusing: it's that setuptools requires you to write setup.py directly, without providing reasonable defaults.

I don't think that's particularly true. I mean yes it's true that writing setup.py sucks, but I think the more tools we add into it make it more complex and feedback from users on Twitter/Reddit/IRC etc mimic that. In other words, the existence of one problem doesn't make the other problem not real.

At some level this is unavoidable given the scope of what we have for packaging (e.g. a singular build tool isn't going to work for everyone, at least not well) but some of it can be mitigated to reduce the complexity. For instance, the wheel package really should be rolled into setuptools itself, and potentially treating the recommended approach to have tools like flit etc be primarily providers of an API with another tool (twine? pip?) that offers things like publish, build, etc. Given that, it's also a good idea to think about whether it makes sense to bundle that tool with pip, since (A) a lot of the same things are going to hve to be done in both and (B) it further reduces the complexity of having 12 different tools in the toolchain.

dstufft avatar Feb 28 '18 15:02 dstufft

I will say one of most common complaints I see amongst people who are new to Python (either new to programming in general, or new to Python but used to other languages) is how fractured our packaging ecosystem is, and how working with it involves interacting with 12 different tools.

This I agree with 100%. However, it's not the publishing side of it that bothers me here, it's the other end of the process. Creating a project template, how do I write docs, where do I put tests, how does tox fit in, what do I need to do to integrate CI (Travis? Appveyor? Both? Something else?). By the time I have something to publish, pip install twine; twine upload is near-trivial.

So if we want to emulate that "one tool to rule them all" environment, we should be starting with the "create a new project" end of the process, not the uploading end. Tidy up the upload end later, once we've got the big stuff covered. (And no, I don't think that's something we should be doing - our current approach of guides recommending current best of breed solutions is better suited to the Python ecosystem IMO, even if we do occasionally look enviously at cargo and npm...)

pfmoore avatar Feb 28 '18 15:02 pfmoore