packaging-problems
packaging-problems copied to clipboard
Campaign to get people Publishing Wheels
How can we get more people to publish Wheels, especially for Windows? Christoph Gohlke published Windows installers but that won't work for Wheel because he won't have rights to upload them.
Perhaps the Build farm I've wanted to do can be used here?
http://pythonwheels.com/ is an attempt at this, now that pip 1.5 installs them by default this should be easier.
I think one part of this would be to make the setup.py ... package uploading process more streamlined and do the right thing.
I was thinking surely it makes sense for a simple "build" command that made wheels, eggs and an sdist by default? Rather than having to specify each one separately?
Am I wrong in thinking that you still need to install another package just to create wheels?
Yes, you need to pip install wheel before setup.py bdist_wheel works. Also, you really shouldn't be making eggs ;)
As of 2015 Christoph Gohlke publishes wheels rather than msi installers http://www.lfd.uci.edu/~gohlke/pythonlibs/
@scopatz is this something you could comment on?
Thanks for roping me into this issue @brainwane.
I am speaking a on behalf of conda-forge here. But basically, we'd love it if conda-forge could be used to build & publish wheels. To that end, it might be more useful to think of conda-forge as just "The Forge."
We have the infrastructure for building binary packages across Linux, OS X, Windows, ARM, and Power8 already. We have a tool called conda-smithy that we develop and maintain that helps us keep all of the packages / recipes / CIs configured and up-to-date.
I see two major hurdles to building and deploying wheels from conda-forge. These could be worked on in parallel.
Building: conda-smithy would need to be updated so that packages that are configured to do would generate the approriate CI scripts (from Jinja templates) to build wheels. This would be CI-provider and architecture specific. Probably the easiest place to start is building from manylinux on Azure. We would probably need at least one configuration variable to live in conda-forge.yml that actively enables wheel building (enable_wheels: true? enable_wheels: {linux-64: true}?). Conda-smithy reads this file when it rerenders a feedstock (a git repo with a specific structure for building packages). There are probably some subtleties and difficulties here with working through which compiler toolchains should be used on different platforms (there is really only the manylinux standard for linux). But this is the basic idea.
The challenge with building is that most of the conda-forge people are not used to building wheels. I am happy to help work on the conda-forge infrastructure side, but I think we need someone who is an expert on the wheels side who is also willing to jump in and help scale this out with me.
Deploying: Once we can build wheels, we need a place to put them. Nominally, this would be PyPI. But we need to be able to do this from a CI service. We are happy to have an authentication token that we use. There isn't much that I see that conda-forge can really do about this (which has prevented us from working on this issue previously). However, I think that the PyPI is working on this.
I am super excited about this; the fundemental premis of conda-forge is to be open source, cross platform, community build infrastructure. If there are other folks out there who are enthusiatsic about getting this working, please reach out to me or put me touch!
Thanks @scopatz! @waveform80 and @bennuttall would you like to speak from the piwheels perspective? And @jwodder, from what you have learned via Wheelodex? (Found out about you via this thread.)
Perhaps the work that @Matthew-Brett did at MacPython to build wheels of key packages of the Scientific Python stack will be helpful as well. Also, I discovered cibuildwheel by @joerick recently. (Edit: wrong Matthew Brett)
For the piwheels project we build arm platform wheels for the Raspberry Pi, built natively on Raspberry Pi hardware, on piwheels.org we don't try to bundle dependencies ala manylinux2010, instead we target what's stable in the distro (Raspbian) and make no promises elsewhere. The project source itself is open, so others could run their own repos targeting other platforms.
I don't recommend maintainers upload arm wheels, and instead let us build them knowing they work on the Pi.
We also attempt to show library dependencies on our project pages e.g. https://www.piwheels.org/project/numpy/ rather than let people work them out e.g. https://blog.piwheels.org/how-to-work-out-the-missing-dependencies-for-a-python-package/
Hi @scopatz, what do you propose to do about shared libraries that have no natural place in a wheel (to me, most shared libraries have no natural place in a wheel).
We cannot stick our heads in the sand on that. That we use shared libraries heavily in conda is one of our most compelling advantanges and because we use the same ones across languages, putting those shared libraries in a wheel would be a bad thing to do.
I'm not coming with a solution here. I wish I were, I really do.
It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.
I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.
A few years ago, @njsmith wrote a spec for pip-installable libaries: https://github.com/pypa/wheel-builders/pull/2
It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.
By the way - @scopatz - I'm happy to help integrating the wheel builds into conda forge - but I'm crazy busy these days, so I won't have much time for heavy lifting.
It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.
Well, the software needs to work of course and I'm not being facetious!
We end up discussing where the line is between the thing itself and the system libraries that support it, and that's not clear cut. Take xgboost as an example. It has a C/C++ library and bindings for Python and R. Now xgboost itself builds static libs for each so they sidestepped that issue while we're much more efficient (n many dimensions). Now libxgboost is clearly a part of the xgboost stack, but what about ncurses? Is it system or not? In conda-forge, we provide it, and in all honesty that line is organic and something we move as and when we find we need to.
@brainwane @scopatz if there's a better title for this issue today, could you change it/comment so that someone else who can make the change, changes it?
I can offer mild packaging familiarity, reasonable python / CI / cloud experience and say 10-20 hours a week for the next month if it would be helpful. I think I would be a good fit if there's a rough consensus on direction and pypa/conda experts available for consulting but bottlenecked on elbow grease
cc @brettcannon @dstufft @asottile
@matthew-brett I thought Carl Kleffner did something similar to a pip installed tool chain with openBLAS for NumPy though my memory might be foggy
@mikofski - right - Carl was working on Mingwpy, which was (still is) a pip-installable gcc compiler chain to build Python extensions that link against the Python.org Microsoft Visual C++ runtime library.
Work has stalled on that, for a variety of reasons, although I still think it would be enormously useful. I can go into more details - or - @carlkl - do you want to give an update here?
@mattip - because we were discussing this a couple of weeks ago.
It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.
I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.
A few years ago, @njsmith wrote a spec for pip-installable libaries: pypa/wheel-builders#2
It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.
I don't know if we have a clear answer that pip should be used as a general-purpose packaging solution. My view which seems to be shared by several others from the recent discourse discussion about it is that it should not try to "reinvent the wheel" or replace general purpose packaging solutions (like conda, yum, apt-get, nix, brew, spack, etc...), pip has clear use as a packaging tool for developers and "self-integrators".
For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.
Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.
A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.
Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"
In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.
@teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.
I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?
From talks at SciPy, it seemed like a good answer for those users is to provide "fat" wheels that ship all needed shared libraries with the wheel. These could be created using conda packages to minimize build time and consolidate build procedures. There was some experimentation with that using numpy and scikit-image as tests. The packages were significantly larger - probably too large. Static linking is much more efficient, but bifurcates the build process. I'm hopeful that we can explore ways to trim down the shared library size such that this approach may be viable. Having any sort of scheme to actually share native libraries via wheels (pynativelib) would help, but I think a strong dependency solver is a hard requirement for implemention of that.
On Sun, Jul 14, 2019, 02:12 Paul Moore [email protected] wrote:
@teoliphant https://github.com/teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.
I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ .
What about SONAME though? Or are you proposing to rewrite them and rename the DSOs? If so are we worried about passing objects between different versions of the same library? The glibc folks warned manylinux about that.
On Sun, Jul 14, 2019, 5:22 PM Mike Sarahan [email protected] wrote:
From talks at SciPy, it seemed like a good answer for those users is to provide "fat" wheels that ship all needed shared libraries with the wheel. These could be created using conda packages to minimize build time and consolidate build procedures. There was some experimentation with that using numpy and scikit-image as tests. The packages were significantly larger - probably too large. Static linking is much more efficient, but bifurcates the build process. I'm hopeful that we can explore ways to trim down the shared library size such that this approach may be viable. Having any sort of scheme to actually share native libraries via wheels (pynativelib) would help, but I think a strong dependency solver is a hard requirement for implemention of that.
On Sun, Jul 14, 2019, 02:12 Paul Moore [email protected] wrote:
@teoliphant https://github.com/teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.
I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAH6S5FZX5IHXLV44YP73HTP7NADRA5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4HQIY#issuecomment-511211555, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH6S5G3OKHVP7DGZGKBVXTP7NADRANCNFSM4AJKEUUQ .
pip has clear use as a packaging tool for developers and "self-integrators".
I guess it is used by those people, but it's used by a lot of other people too.
For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.
I guess the problem is growing, but only in the sense that there are an increasing number of packages that ship wheels now. There are some difficult packages - I know that the GUI packages can have trouble. What difficulties are pytorch, rapids, arrow having? I'm happy to advise.
Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.
A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.
I think that's exactly the problem - it's not practical for a Python package to try and work with the huge numbers of package variants that it could encounter.
Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"
I don't think Scipy or PyData users will have any trouble - were you thinking of any package in particular? Numpy / Scipy / Matplotlib / Pandas are all well packaged, and have been for a long time.
In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.
I don't think there's much appetite for making pip installs depend on prior conda installs - wouldn't that just increase the confusion?
What difficulties are pytorch, rapids, arrow having? I'm happy to advise.
For arrow, I think it's best summarized here:
https://twitter.com/wesmckinn/status/1149319821273784323
- many C++ dependencies
- several bundled shared libraries
- some libraries statically linked
- privately namespaced, bundled version of Boost
@wesm - I'm happy to help with this - let me know if I can. Did you already contact the scikit-build folks? I have the impression they are best for C++ chains. (Sorry, I can't reply on Twitter, have no account).
I believe we have one of the most complex package builds in the whole Python ecosystem. I think TensorFlow or PyTorch might have us beat, but it's close (it's obviously not a competition =D).
I haven't contacted the scikit-build folks yet, if that could help us simplify our Python build I'm quite interested. I'm personally all out of budget for this after I lost a third or more of my June to build and package-related issues so maybe someone else can look into it
cc @pitrou @xhochy @kszucs @nealrichardson
Thanks - that sounds very tiring. I bet we can use this as a stimulus to improve the tooling. Would you mind making an issue in some sensible place in the Arrow repositories for us to continue the discussion?
I'll echo what @wesm said here. I spent a lot of time as well trying to cope with wheel packaging issues on PyArrow. I'd be much happier if people accepted to settle on conda for distribution and installation of compiled Python packages.
(disclaimer: I used to work for Anaconda but don't anymore. Also I own a very small amount of company shares)
@pitrou - I hear the hope, but I really doubt that's going to happen in the short term. So I still think the best way, for now, is for those of us with some interest and time, to try and improve the wheel building machinery to the point where they are a minimal drain on your development resources.
Just to drop some statistics to indicate the seriousness of this problem, our download numbers are growing to the same magnitude as NumPy and pandas
$ pypistats overall pyarrow
| category | percent | downloads |
|-----------------|--------:|-----------:|
| with_mirrors | 50.18% | 9,700,974 |
| without_mirrors | 49.82% | 9,630,781 |
| Total | | 19,331,755 |
$ pypistats overall numpy
| category | percent | downloads |
|-----------------|--------:|------------:|
| with_mirrors | 50.15% | 114,356,740 |
| without_mirrors | 49.85% | 113,661,813 |
| Total | | 228,018,553 |
$ pypistats overall pandas
| category | percent | downloads |
|-----------------|--------:|------------:|
| with_mirrors | 50.12% | 67,694,077 |
| without_mirrors | 49.88% | 67,358,042 |
| Total | | 135,052,119 |
One of the reasons for our complex build environment is that we're solving problems that are very difficult or impossible to solve without a deep dependency stack. So there is no end in sight to our suffering with the current state of wheels
Did you already contact the scikit-build folks? I have the impression they are best for C++ chains
I believe conda is the best for C++ chains but I would say that.