python-dependency-injector icon indicating copy to clipboard operation
python-dependency-injector copied to clipboard

Put fewer packages on PyPI

Open mattip opened this issue 3 years ago • 9 comments

This project requires an out-of-proportion amount of storage space on PyPI. This is problematic since the storage space is donated and the general assumption is that projects will not over-use the resources. In order to analyze what is going on, let's look at some data.

Each release of the project creates these artifacts (taken from the 4.35.1 release)

cp27-cp27m-macosx_10_9_x86_64.whl, 684.1 kB
cp27-cp27m-manylinux1_i686.whl, 2.4 MB
cp27-cp27m-manylinux1_x86_64.whl, 2.6 MB
cp27-cp27m-manylinux2010_i686.whl, 2.4 MB
cp27-cp27m-manylinux2010_x86_64.whl, 2.6 MB
cp27-cp27mu-manylinux1_i686.whl, 2.4 MB
cp27-cp27mu-manylinux1_x86_64.whl, 2.6 MB
cp27-cp27mu-manylinux2010_i686.whl, 2.4 MB
cp27-cp27mu-manylinux2010_x86_64.whl, 2.6 MB
cp35-cp35m-macosx_10_9_x86_64.whl, 698.8 kB
cp35-cp35m-manylinux1_i686.whl, 2.9 MB
cp35-cp35m-manylinux1_x86_64.whl, 3.1 MB
cp35-cp35m-manylinux2010_i686.whl, 2.9 MB
cp35-cp35m-manylinux2010_x86_64.whl, 3.1 MB
cp35-cp35m-manylinux2014_aarch64.whl, 3.7 MB
cp35-cp35m-win32.whl, 354.9 kB
cp35-cp35m-win_amd64.whl, 422.1 kB
cp36-cp36m-macosx_10_9_x86_64.whl, 700.7 kB
cp36-cp36m-manylinux1_i686.whl, 3.0 MB
cp36-cp36m-manylinux1_x86_64.whl, 3.3 MB
cp36-cp36m-manylinux2010_i686.whl, 3.0 MB
cp36-cp36m-manylinux2010_x86_64.whl, 3.3 MB
cp36-cp36m-manylinux2014_aarch64.whl, 3.8 MB
cp36-cp36m-win32.whl, 383.6 kB
cp36-cp36m-win_amd64.whl, 451.7 kB
cp37-cp37m-macosx_10_9_x86_64.whl, 704.6 kB
cp37-cp37m-manylinux1_i686.whl, 3.0 MB
cp37-cp37m-manylinux1_x86_64.whl, 3.2 MB
cp37-cp37m-manylinux2010_i686.whl, 3.0 MB
cp37-cp37m-manylinux2010_x86_64.whl, 3.2 MB
cp37-cp37m-manylinux2014_aarch64.whl, 3.8 MB
cp37-cp37m-win32.whl, 381.7 kB
cp37-cp37m-win_amd64.whl, 452.2 kB
cp38-cp38-macosx_10_9_x86_64.whl, 730.4 kB
cp38-cp38-manylinux1_i686.whl, 4.0 MB
cp38-cp38-manylinux1_x86_64.whl, 4.2 MB
cp38-cp38-manylinux2010_i686.whl, 4.0 MB
cp38-cp38-manylinux2010_x86_64.whl, 4.2 MB
cp38-cp38-manylinux2014_aarch64.whl, 4.8 MB
cp38-cp38-win32.whl, 394.0 kB
cp38-cp38-win_amd64.whl, 479.7 kB
cp39-cp39-macosx_10_9_x86_64.whl, 734.2 kB
cp39-cp39-manylinux1_i686.whl, 3.5 MB
cp39-cp39-manylinux1_x86_64.whl, 3.8 MB
cp39-cp39-manylinux2010_i686.whl, 3.5 MB
cp39-cp39-manylinux2010_x86_64.whl, 3.8 MB
cp39-cp39-manylinux2014_aarch64.whl, 4.3 MB
cp39-cp39-win32.whl, 392.6 kB
cp39-cp39-win_amd64.whl, 479.4 kB
pp27-pypy_73-macosx_10_9_x86_64.whl, 501.4 kB
pp27-pypy_73-manylinux1_x86_64.whl, 543.0 kB
pp27-pypy_73-manylinux2010_x86_64.whl, 543.0 kB
pp27-pypy_73-win32.whl, 342.4 kB
pp36-pypy36_pp73-macosx_10_9_x86_64.whl, 498.5 kB
pp36-pypy36_pp73-manylinux1_x86_64.whl, 542.1 kB
pp36-pypy36_pp73-manylinux2010_x86_64.whl, 542.1 kB
pp36-pypy36_pp73-win32.whl, 300.8 kB
pp37-pypy37_pp73-macosx_10_9_x86_64.whl, 498.5 kB
pp37-pypy37_pp73-manylinux1_x86_64.whl, 542.1 kB
pp37-pypy37_pp73-manylinux2010_x86_64.whl, 542.0 kB
pp37-pypy37_pp73-win32.whl, 300.8 kB

I think I left out the source tarball. This sums up to ~122GB~ 122MB per release. The project has had about 50 releases in the first half of 2021, sometimes multiple releases on a single day. This comes out to about ~12 TB~ 12GB a year. It seems this project has under 2000 downloads a month. Scipy, by comparision, ships 18 wheels, each about 30MB, twice a year for 30GB of yearly storage and has about 30 million downloads a month (take those statistics with a grain of salt, they say the last version of this package is 1.2.0).

So how can you reduce the resource requirements ~by three orders of magnitude~?

  • Release a pure-python version of the package. This would reduce both the number of wheels and the size. Is it clear that the cython speed is a requirement of the project? Note this would not preclude building wheels for the "more important" platforms, pip install will prefer binary wheels to pure python ones. You may be interested in refactoring the code to use the "pure python" mode available in cython 3.0, which will make supporting both modes in the codebase simpler.
  • Release 4 times a year instead of ~100 times a year. (a 25x reduction)
  • Do not release both manylinux1 and manylinux2010 packages (a 2x reduction). I would stick with manylinux2010, but you know your users better than I do.
  • Drop older versions of python (3.5, 3.6, pypy2.7, pypy3.6) (around a 2x reduction)
  • Strip the builds. I see you use cibuildwheel, there is a discussion on how to do this pypa/cibuildwheel#331 (maybe ~3x reduction, maybe more?).

mattip avatar Aug 13 '21 07:08 mattip

Another thing to think about is whether the package can be built using the limited API, it seems cython has some support and there are some hints for cibuildwheel. Then there would only be one wheel for all the python versions on a platform.

mattip avatar Aug 13 '21 10:08 mattip

122GB

Unless I'm missing something, I make it 122MB. Which seems a little more reasonable. Although smaller is obviously still better.

I'd be reluctant to recommend Cython's limited API support yet (by all means try it and submit bug reports, but I wouldn't release it to actual users)

da-woods avatar Aug 13 '21 10:08 da-woods

Sorry, miscalculated. Fixing the comment

mattip avatar Aug 13 '21 10:08 mattip

The other approach that people sometimes use is to ship the Cython-generated C files rather than binaries (although obviously that then requires the user to have a C compiler, but doesn't require the user to have Cython). Obviously that could be combined with pure-Python mode as an additional fallback level.

da-woods avatar Aug 13 '21 11:08 da-woods

@mattip , this project had 360 000 downloads last month - https://pypistats.org/packages/dependency-injector

You were looking at a wrong project with similar name.

rmk135 avatar Aug 13 '21 12:08 rmk135

The other approach that people sometimes use is to ship the Cython-generated C files rather than binaries (although obviously that then requires the user to have a C compiler, but doesn't require the user to have Cython). Obviously that could be combined with pure-Python mode as an additional fallback level.

I shipped project as generated C code and it created some problems for the users. As you noted, with C sources everybody will need to install C compilers. And build time will be much higher. Also this spends much more resources in a global scope because of thousand of compilations vs a single one on CD server. Just removing the pre-compiled wheels from the project that already had ones doesn't seem to be a good solution. This will break people's software.

rmk135 avatar Aug 13 '21 12:08 rmk135

So how can you reduce the resource requirements ~by three orders of magnitude~?

  • Release a pure-python version of the package. This would reduce both the number of wheels and the size. Is it clear that the cython speed is a requirement of the project? Note this would not preclude building wheels for the "more important" platforms, pip install will prefer binary wheels to pure python ones. You may be interested in refactoring the code to use the "pure python" mode available in cython 3.0, which will make supporting both modes in the codebase simpler.
  • Release 4 times a year instead of ~100 times a year. (a 25x reduction)
  • Do not release both manylinux1 and manylinux2010 packages (a 2x reduction). I would stick with manylinux2010, but you know your users better than I do.
  • Drop older versions of python (3.5, 3.6, pypy2.7, pypy3.6) (around a 2x reduction)
  • Strip the builds. I see you use cibuildwheel, there is a discussion on how to do this Strip debug symbols of wheels pypa/cibuildwheel#331 (maybe ~3x reduction, maybe more?).

@mattip , thank you, that's something to think about. The problem is that I'm stuck at the moment cause I can not make a single release.

rmk135 avatar Aug 13 '21 12:08 rmk135

You were looking at a wrong project with similar name.

That makes more sense, thanks.

Correct me if I am wrong, it seems this new release is for issue #477. Perhaps you could delay the release to the end of the month, and in the mean time shrink the release so it fits into the space you freed up by deleting older versions. I would expect the approval process for PyPI to take a few weeks anyway, they are very overburdened.

mattip avatar Aug 13 '21 13:08 mattip

@mattip Yeah, that's correct. You brought up a lot of interesting suggestions. I don't think I'll be able to apply all mentioned, but I should be able to make some improvements for sure. Thanks again for your input!

rmk135 avatar Aug 13 '21 15:08 rmk135