jupyter-scheduler icon indicating copy to clipboard operation
jupyter-scheduler copied to clipboard

Jupyter Scheduler 2.10.0 Source Distribution tar built too large causing PyPI upload failure

Open andrii-i opened this issue 1 year ago • 10 comments

Description

Jupyter Scheduler 2.10.0 Source Distribution initial upload failed due to PyPI source distributions size limits (~150 Mb) due to tar build being drastically larger in size vs before jupyter-releaser introduction.

Built distribution upload went through, npm upload did not as it's later in the script.

How to reproduce

  • See CI failure: https://github.com/jupyter-server/jupyter-scheduler/actions/runs/11808986203/job/32898512565#step:5:372
  • Try building jupyter scheduler PyPI source distribution locally with jupyter-releaser build-python, see its size (>100 Mb)

Expected behavior

  • ✅ Source distribution of the normal size is available in PyPI, @jupyterlab/scheduler 2.10.0 is released at npm.
  • Workflow does not fail, source distribution of the reasonable size is built and uploaded to PyPI, upload to npm happens.

andrii-i avatar Nov 13 '24 22:11 andrii-i

For reference, here is the relevant log excerpt:

WARNING  Error during upload. Retry with the --verbose option for more details.
ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
         File too large. Limit for project 'jupyter-scheduler' is 100 MB. See
         https://pypi.org/help/#file-size-limit for more information.

dlqqq avatar Nov 14 '24 18:11 dlqqq

Jupyter Scheduler 2.10.0 npm package is now available at https://www.npmjs.com/package/@jupyterlab/scheduler, Source Distribution is now available at PyPI https://pypi.org/project/jupyter-scheduler/2.10.0/#files.

Let's use this issue to track the need to understand why Jupyter Scheduler 2.10.0 Source Distribution tar was built too large causing PyPI upload failure and to prevent it happening in the next release.

jupyter_releaser issue on the topic: https://github.com/jupyter-server/jupyter_releaser/issues/592

andrii-i avatar Nov 14 '24 21:11 andrii-i

Try building jupyter scheduler PyPI source distribution locally with jupyter-releaser build-python, see its size (>100 Mb)

Out of curiosity, do you know why it produces so big a distribution? From a quick look it seems that you might be missing:

[tool.jupyter-releaser.hooks]
before-build-python = ["jlpm clean:all"]

in the pyproject.toml but that's just a guess.

krassowski avatar Nov 14 '24 21:11 krassowski

@krassowski no. I've created https://github.com/jupyter-server/jupyter_releaser/issues/592 in jupyter_releaser repo to surface the problem and hopefully get some insight from jupyter_releaser contributors.

Thank you for the suggestion and generally for looking into this.

andrii-i avatar Nov 14 '24 21:11 andrii-i

Do you have the contents of the package built locally with jupyter-releaser build-python?

krassowski avatar Nov 14 '24 21:11 krassowski

@krassowski yes, here it is https://www.dropbox.com/scl/fi/51y8zhsjeqx2jmsyg9mll/jupyter_scheduler-2.10.0.tar.gz?rlkey=9v5gafpayncj7831zfr4q4o00&st=svnlkhgx&dl=0 (153.9 Mb)

andrii-i avatar Nov 14 '24 22:11 andrii-i

It looks like it includes .yarn and node_modules directories which I am sure is responsible for a large portion of the size. It obviously should not be included. Also see https://github.com/jupyter-server/jupyter_releaser/issues/592#issuecomment-2478372873.

I think in addition jlpm clean:all you should also add:

[tool.hatch.build.targets.sdist]
artifacts = ["jupyter_scheduler/labextension"]
exclude = [".github", "binder"]

so binder directory gets excluded.

That said, I already see jupyter_scheduler/labextension in the tarball you shared and it, along node_modules should have been excluded by hatch because it is in your .gitignore.

So why does it include things from the git repo?

In the logs of check-release action (https://github.com/jupyter-server/jupyter-scheduler/actions/runs/11809051201/job/32898683727) I see that the releaser is reading configuration from package.json rather than from pyproject.toml. I wonder if this could be related:

build-python

--------------------------------------------------
Using default value for dist_dir: 'dist'
Using default value for python_packages: '['.']'
Using default value for help: 'False'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running hooks for before-build-python
jupyter-releaser configuration loaded from package.json.

Also, that one does include the clean hook:

https://github.com/jupyter-server/jupyter-scheduler/blob/14c44518f9d48eb40eb6c20e063c275c84358698/package.json#L132-L143

Interesting. It looks like it did not use hatch at all?

krassowski avatar Nov 15 '24 09:11 krassowski

None of that helps yet: https://github.com/jupyter-server/jupyter-scheduler/pull/561

I went ahead and triggered a new check-release run on an unrelated project just to see if this is not a regression in the ecosystem (rather than a misconfiguration). Compare older run on variable inspector with the run triggered today and both result in 1.53 MB of artifacts, so I do not think that this is a system-wide issue, but just a problem with configuration.

krassowski avatar Nov 15 '24 10:11 krassowski

I tried aligning the scheduler config with other repos using releaser in https://github.com/jupyter-server/jupyter-scheduler/pull/561 but nothing helped.

The thing is that jupyter-releaser does not do anything bespoke, it just runs pipx run build (here). It should not result in anything different from python -m build as used by the build action:

https://github.com/jupyter-server/jupyter-scheduler/blob/14c44518f9d48eb40eb6c20e063c275c84358698/.github/workflows/build.yml#L57

krassowski avatar Nov 15 '24 10:11 krassowski

Running pipx run build locally does not produce such a large tarball for me, just 3.6 MB.

krassowski avatar Nov 15 '24 11:11 krassowski