iceberg-python
iceberg-python copied to clipboard
[Feature] Provide Nightly Build to PyPi
Feature Request / Improvement
Starting an issue to gather feedback on providing nightly builds for pyiceberg. Resolves #734.
Thanks @syun64 for the pointers and feedback.
PyIceberg Release Process
The current release process as documented in How to release. To publish a release candidate (RC) to the public,
- Tag and sign Major/Minor release via git, push to the
apachebranch - Build artifacts with Github action
Python ReleaseGithub action- Outputs
release-mainartifact, available for download
- Upload to Apache SVN
- Both the source distribution (sdist) and the binary distributions (wheels) need to be published for the RC
- Generate SVN hash for artifacts, sign with gpg
- https://dist.apache.org/repos/dist/dev/iceberg/
- Upload to PyPi
- Download artifact from
Python releaseGithub action - twine upload
- Download artifact from
Proposed Nightly Build Process
Goals
- Only PyPi is needed, can skip SVN
- Automate nightly build. Using cron-based Github Action
- Automate upload to PyPi. Using Github Action to push directly to PyPi
- Make sure the PyPi package is uploaded as pre-release/development versions
- Make nightly build installable via
pip install pyiceberg --pre, preferred.- Alternatively, install via a new nightly package, i.e.
pyiceberg-nightly
- Alternatively, install via a new nightly package, i.e.
Reference
- https://medium.com/@blackary/publishing-a-python-package-from-github-to-pypi-in-2024-a6fb8635d45d
- https://github.com/marketplace/actions/pypi-publish
I've recently received feedback from users that it would be beneficial to have more releases. A faster release cadence might be more desirable than having a nightly build.
Both will require some kind of automation for the release process.
Based on this tutorial, I was able to publish new versions of the library to PyPi via Github Action.
Here are the relevant steps:
I created an account on Pypi and was able to publish my forked repo of Pyiceberg using the pypi-publish GitHub Action.
To do so, I created .github/workflows/publish.yml file and pushed it to my forked repo's main branch.
I had to change the package name to pyiceberg-kevinliu to not conflict with the existing package.
I set up "Trusted Publisher Management" via the Pypi website for my forked repo.
On the forked repo, I created a new release and tag, named "v0.6.1". This kicks off the Github Action to publish to Pypi
Resulting in this new package https://pypi.org/project/pyiceberg-kevinliu/
I hope we can use some parts of the above to make future releases faster and more automated
@Fokko / @HonahX / @syun64 Would love to get your thoughts on this.
Very exciting to hear that you were already able to get a package published through Github Actions! For now, I'm leaning towards this approach of having a separate namespace for for nightly builds like pyiceberg-nightly.
One downside of that approach is that this will create a name collision issues if users accidentally install both pyiceberg and pyiceberg-nightly packages in the same environment.
But I think there's still a lot to gain by separating out the package namespace of an intentional publication (release candidates, and successful releases) versus an automated nightly publication from main, just in terms of how easy it would be for us to manage our packages. I'm not quite sure what the best way to do this would be, but I would imagine we would want to support the concept of having a retention policy on the nightly package as well, so we clean up packages that were published a year ago, as an example.
Here's a link to some relevant discussion on this topic on a PyPi warehouse discussion thread
@kevinjqliu Thanks for doing the experiments!
But I think there's still a lot to gain by separating out the package namespace of an intentional publication (release candidates, and successful releases)
+1, the ASF released policy also suggests that we should "hide" the nightly build from non-developer as much as possible. We could also consider other ways of separation: For example:
FWIW, Iceberg Java also publishes nightly snapshots: https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/ But it is hidden quite well for a reason :D
I'm open to it. I'm not sure if a separate package is the best, as you can also set tags on the releases itself: https://pypi.org/project/pyiceberg/#history You can see the pre-releases there.
Another nice thing is that we would test our release pipelines on a daily basis 💪
Seems like we have a way forward for nightly build. We can run the Pypi upload on a GitHub action nightly cron.
I want to take a step back and talk about the general release process. I want to figure out how to shorten the burden of the release process so that we can release at a faster cadence. The release instructions document several steps of the process
- set git tag
- sign files with GPG and upload to SVN
- upload to Pypi
- email devlist about new release
Are these steps all necessary to release a new version? Is there room for automation similar to the Pypi automation above? Can we use the Github Release process somehow?
any news on this, currently pyiceberg is broken with polaris and will like to use the latest update that fix it
@djouallah I'll take another look at the nightly build. But we're in the process of releasing 0.8.0; its in the voting stage https://lists.apache.org/thread/0xcw56z1bpldypm7pv92h70fhhq0qgfq
FYI https://lists.apache.org/thread/oowhcfwv3fcjzdzm76tbn99k5q84mr75 One step closer to nightly build
We now have a nightly build (UTC midnight) that will automatically push to testpypi https://test.pypi.org/project/pyiceberg/#history
Due to versioning scheme, this can be install by using --pre,
pip install --index-url https://test.pypi.org/simple/ --pre pyiceberg
First cron scheduled run is ✅ https://github.com/apache/iceberg-python/actions/runs/13210296696 https://test.pypi.org/project/pyiceberg/0.9.0.dev20250208002427/