pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][build] Create images before build python wheels

Open RobertIndie opened this issue 3 years ago • 4 comments

Motivation

Currently in the release process of the python client, there is no indication that the related images need to be created before building python wheels. This may cause the release manager to build the wheel without using the latest images.

Modifications

  • Call create-images.sh in the build-wheels.sh
  • Add option --skip-create-images to the build-wheels.sh to skip the image creation. The default is not skipped.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • [ ] doc-required (Your PR needs to update docs and you will update later)

  • [x] doc-not-needed (Please explain why)

  • [ ] doc (Your PR contains doc changes)

  • [ ] doc-complete (Docs have been already added)

RobertIndie avatar Sep 08 '22 06:09 RobertIndie

@RobertIndie In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them. Is there any particular image that is missing?

merlimat avatar Sep 12 '22 22:09 merlimat

@merlimat IMO, maintaining the images in DockerHub manually is not a good way to do that, especially because only a few members have the permission. If the release manager forgot to run docker update, an outdated build image could be used.

The basic build image usually installed some dependencies by downloading and building. It should not take much time. But I found it could be speeded up significantly by reducing the output. For example, replace tar xvfz with tar xvz.

BewareMyPower avatar Sep 13 '22 02:09 BewareMyPower

@RobertIndie In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them. Is there any particular image that is missing?

For example, the version of OpenSSL is wrong in the pulsar python client 2.10.1. You can see more detail here: https://github.com/apache/pulsar/issues/17097#issuecomment-1236873140

I think this is because the latest image was not used in the release 2.10.1 process.

RobertIndie avatar Sep 14 '22 03:09 RobertIndie

Maybe we can add an option to determine whether to create the images.

I have added an option to skip the image creation. PTAL again, Thanks.

RobertIndie avatar Sep 14 '22 09:09 RobertIndie

IMO, maintaining the images in DockerHub manually is not a good way to do that, especially because only a few members have the permission. If the release manager forgot to run docker update, an outdated build image could be used.

@BewareMyPower If we add a docker pulsar $IMAGE it will force to use the version that exists on DockerHub

merlimat avatar Sep 22 '22 15:09 merlimat

@merlimat Did you mean docker pull? What I concerned more about is that the image might not be built with the latest Dockerfile because it's uploaded manually. At the moment when uploading images cannot be executed automatically, we'd better support building the base image.

BewareMyPower avatar Sep 23 '22 02:09 BewareMyPower

Creating the images takes a very long time. We should have the images built once, available on Docker hub and cached as needed.

By doing docker pull $IMAGE before using it, we're making sure that when the image is manually updated, the builds will start using it.

merlimat avatar Sep 23 '22 17:09 merlimat

In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them.

I agree now. I've thought the create-images.sh is similar to building base images for RPM and DEB packages. I took a look just now, this script will create 16 images, which takes too much time.

pulsar-build:manylinux2014-cp37-cp37m-x86_64
pulsar-build:manylinux2014-cp38-cp38-x86_64
pulsar-build:manylinux2014-cp39-cp39-x86_64
pulsar-build:manylinux2014-cp310-cp310-x86_64
pulsar-build:manylinux2014-cp37-cp37m-aarch64
pulsar-build:manylinux2014-cp38-cp38-aarch64
pulsar-build:manylinux2014-cp39-cp39-aarch64
pulsar-build:manylinux2014-cp310-cp310-aarch64
pulsar-build:manylinux_musl-cp37-cp37m-aarch64
pulsar-build:manylinux_musl-cp38-cp38-aarch64
pulsar-build:manylinux_musl-cp39-cp39-aarch64
pulsar-build:manylinux_musl-cp310-cp310-aarch64
pulsar-build:manylinux_musl-cp37-cp37m-x86_64
pulsar-build:manylinux_musl-cp38-cp38-x86_64
pulsar-build:manylinux_musl-cp39-cp39-x86_64
pulsar-build:manylinux_musl-cp310-cp310-x86_64

By doing docker pull $IMAGE before using it, we're making sure that when the image is manually updated, the builds will start using it.

My point is not about whether the local image is latest. I mean the process is complicated and needs some manual operations. We must document that once any of the following files are changed, we must notify a maintainer of apachepulsar DockerHub account to upload the newest image.

  • python-versions.sh, which specifies the supported Python versions and the associated base images
  • manylinux2014/Dockerfile, the base image of Python 3.7 to 3.10 for x86_64 and aarch64 architectures
  • manylinux_musl/Dockerfile, the base image of Python 3.7 to 3.10 for Alpine compatible wheels

If we want to change these Dockerfiles, for example, the xvfz option for tar command takes much more time than xvz for the extra output. Even this minor optimization requires a complicated update steps.

In addition, we should not use such a tag like manylinux_musl-cp310-cp310-x86_64. It should be a part of image name, the tag should be a version.


In conclusion, I agree that we should not add an option to create base images for building Python wheels currently. But we should make the process clear in case when the base image changed.

/cc @merlimat @RobertIndie

BewareMyPower avatar Sep 24 '22 17:09 BewareMyPower

I realized the key point suddenly. https://github.com/apache/pulsar/pull/17538 upgrades the OpenSSL version to 1.1.1n for some Dockerfiles, but even it has been merged into the master, it's hard to know:

  • whether the latest images in DockerHub include this change? If yes, how to keep the compatibility with the previous image that might have been overwritten.
  • are these images related to Python clients?

See the related issue: https://github.com/apache/pulsar/issues/17097

/cc @merlimat @liliang950210

BewareMyPower avatar Sep 24 '22 17:09 BewareMyPower

My point is not about whether the local image is latest. I mean the process is complicated and needs some manual operations. We must document that once any of the following files are changed, we must notify a maintainer of apachepulsar DockerHub account to upload the newest image.

  • python-versions.sh, which specifies the supported Python versions and the associated base images
  • manylinux2014/Dockerfile, the base image of Python 3.7 to 3.10 for x86_64 and aarch64 architectures
  • manylinux_musl/Dockerfile, the base image of Python 3.7 to 3.10 for Alpine compatible wheels

I think we can make these operations happen in CI. If there is a PR merged into the master branch, we can use the CI to check for changes to these files. If there are changes, we can automatically push the latest image to the docker hub

RobertIndie avatar Sep 28 '22 01:09 RobertIndie

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Oct 28 '22 02:10 github-actions[bot]

Closed as the development of the Python client has been permanently moved to https://github.com/apache/pulsar-client-python.

tisonkun avatar Nov 11 '22 09:11 tisonkun