iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

[Feature] Provide Nightly Build to PyPi

Open kevinjqliu opened this issue 1 year ago • 8 comments

Feature Request / Improvement

Starting an issue to gather feedback on providing nightly builds for pyiceberg. Resolves #734. Thanks @syun64 for the pointers and feedback.

PyIceberg Release Process

The current release process as documented in How to release. To publish a release candidate (RC) to the public,

  • Tag and sign Major/Minor release via git, push to the apache branch
  • Build artifacts with Github action
  • Upload to Apache SVN
    • Both the source distribution (sdist) and the binary distributions (wheels) need to be published for the RC
    • Generate SVN hash for artifacts, sign with gpg
    • https://dist.apache.org/repos/dist/dev/iceberg/
  • Upload to PyPi
    • Download artifact from Python release Github action
    • twine upload

Proposed Nightly Build Process

Goals

  • Only PyPi is needed, can skip SVN
  • Automate nightly build. Using cron-based Github Action
  • Automate upload to PyPi. Using Github Action to push directly to PyPi
  • Make sure the PyPi package is uploaded as pre-release/development versions
  • Make nightly build installable via pip install pyiceberg --pre, preferred.
    • Alternatively, install via a new nightly package, i.e. pyiceberg-nightly

Reference

  • https://medium.com/@blackary/publishing-a-python-package-from-github-to-pypi-in-2024-a6fb8635d45d
  • https://github.com/marketplace/actions/pypi-publish

kevinjqliu avatar Jun 29 '24 19:06 kevinjqliu

I've recently received feedback from users that it would be beneficial to have more releases. A faster release cadence might be more desirable than having a nightly build.

Both will require some kind of automation for the release process.

kevinjqliu avatar Jul 15 '24 17:07 kevinjqliu

Based on this tutorial, I was able to publish new versions of the library to PyPi via Github Action.

Here are the relevant steps:

I created an account on Pypi and was able to publish my forked repo of Pyiceberg using the pypi-publish GitHub Action.

To do so, I created .github/workflows/publish.yml file and pushed it to my forked repo's main branch.

I had to change the package name to pyiceberg-kevinliu to not conflict with the existing package.

I set up "Trusted Publisher Management" via the Pypi website for my forked repo.

On the forked repo, I created a new release and tag, named "v0.6.1". This kicks off the Github Action to publish to Pypi

Resulting in this new package https://pypi.org/project/pyiceberg-kevinliu/

kevinjqliu avatar Jul 15 '24 17:07 kevinjqliu

I hope we can use some parts of the above to make future releases faster and more automated

kevinjqliu avatar Jul 15 '24 17:07 kevinjqliu

@Fokko / @HonahX / @syun64 Would love to get your thoughts on this.

kevinjqliu avatar Jul 15 '24 17:07 kevinjqliu

Very exciting to hear that you were already able to get a package published through Github Actions! For now, I'm leaning towards this approach of having a separate namespace for for nightly builds like pyiceberg-nightly.

One downside of that approach is that this will create a name collision issues if users accidentally install both pyiceberg and pyiceberg-nightly packages in the same environment.

But I think there's still a lot to gain by separating out the package namespace of an intentional publication (release candidates, and successful releases) versus an automated nightly publication from main, just in terms of how easy it would be for us to manage our packages. I'm not quite sure what the best way to do this would be, but I would imagine we would want to support the concept of having a retention policy on the nightly package as well, so we clean up packages that were published a year ago, as an example.

Here's a link to some relevant discussion on this topic on a PyPi warehouse discussion thread

sungwy avatar Jul 15 '24 21:07 sungwy

@kevinjqliu Thanks for doing the experiments!

But I think there's still a lot to gain by separating out the package namespace of an intentional publication (release candidates, and successful releases)

+1, the ASF released policy also suggests that we should "hide" the nightly build from non-developer as much as possible. We could also consider other ways of separation: For example:

HonahX avatar Jul 16 '24 06:07 HonahX

FWIW, Iceberg Java also publishes nightly snapshots: https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/ But it is hidden quite well for a reason :D

I'm open to it. I'm not sure if a separate package is the best, as you can also set tags on the releases itself: https://pypi.org/project/pyiceberg/#history You can see the pre-releases there.

Another nice thing is that we would test our release pipelines on a daily basis 💪

Fokko avatar Jul 16 '24 10:07 Fokko

Seems like we have a way forward for nightly build. We can run the Pypi upload on a GitHub action nightly cron.

I want to take a step back and talk about the general release process. I want to figure out how to shorten the burden of the release process so that we can release at a faster cadence. The release instructions document several steps of the process

  • set git tag
  • sign files with GPG and upload to SVN
  • upload to Pypi
  • email devlist about new release

Are these steps all necessary to release a new version? Is there room for automation similar to the Pypi automation above? Can we use the Github Release process somehow?

kevinjqliu avatar Jul 16 '24 17:07 kevinjqliu

any news on this, currently pyiceberg is broken with polaris and will like to use the latest update that fix it

djouallah avatar Oct 20 '24 01:10 djouallah

@djouallah I'll take another look at the nightly build. But we're in the process of releasing 0.8.0; its in the voting stage https://lists.apache.org/thread/0xcw56z1bpldypm7pv92h70fhhq0qgfq

kevinjqliu avatar Nov 08 '24 17:11 kevinjqliu

FYI https://lists.apache.org/thread/oowhcfwv3fcjzdzm76tbn99k5q84mr75 One step closer to nightly build

kevinjqliu avatar Dec 02 '24 19:12 kevinjqliu

We now have a nightly build (UTC midnight) that will automatically push to testpypi https://test.pypi.org/project/pyiceberg/#history

Due to versioning scheme, this can be install by using --pre,

pip install --index-url https://test.pypi.org/simple/ --pre pyiceberg

kevinjqliu avatar Feb 07 '25 16:02 kevinjqliu

First cron scheduled run is ✅ https://github.com/apache/iceberg-python/actions/runs/13210296696 https://test.pypi.org/project/pyiceberg/0.9.0.dev20250208002427/

kevinjqliu avatar Feb 08 '25 00:02 kevinjqliu