gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Consider publishing the docker BUILD stage as a separate image?

Open cboettig opened this issue 1 year ago • 3 comments

Feature description

Would you consider publishing the build stage of your docker builds as a separate image, such that downstream projects could COPY --from the build image like so https://github.com/OSGeo/gdal/blob/master/docker/ubuntu-small/Dockerfile#L256 but without having to duplicate the multi-stage build?

Context: I co-maintain the https://rocker-project.org Docker stacks for geospatial work, where we are really keen to be able to offer users access to the latest versions of GDAL.

cboettig avatar May 01 '24 03:05 cboettig

I'm not totally sure if it is a good idea:

  • the GDAL Docker images have been mostly designed as being final end products, not necessarily something you extend on. One aspect of this is that the PROJ and GDAL builds are "non-packaged" builds, that is not using APK (for Alpine) or DEB (for Ubuntu) packages, but just doing a "make install" in /usr. So if someone then installs (directly or indirectly) PROJ or GDAL, chaos may arise.
  • this would add a bit of complexity to the wrapping shell script. Nothing dramatic probably, but still
  • someone copying from the builder images must make sure they use the appropriate base image (although that changes infrequently)

Despite all the above, if you want to build upon the build stages, wouldn't just copying the build recipees and adapting them for your needs do the job? I would also suggest for example using the Alpine official packages. Alpine is released every 6 months and has thus really fresh GDAL.

rouault avatar May 01 '24 12:05 rouault

@rouault thanks for the reply.

The rocker stack is based on Ubuntu LTS releases, (with gpu part of the stack building on nvidia-based images derived from the ubuntu base). I appreciate the suggestion but we don't support the Alpine based images, so the APK binaries are not a good option for us.

I completely understand that you build these images as end products. My goal is to provide containerized environments supporting an R / python community, where as you know there is a considerable ecosystem of packages in each language that bind your GDAL libraries as part of larger software environment that is in a predominantly Ubuntu (deb) based system. So while the current docker images serve an important purpose as "end products", there is a substantial user base that could potentially benefit from being able to extend those images. For instance, compare to the Nvidia cuda docker stack, that provides a suite of runtime and devel images meant to be extended by other projects seeking to build upon their libraries rather than use them merely as an 'end product'.

None of the three things you mention are substantial barriers in this context -- (a) it's common to use source-based installs with care not to conflict with binary versions from the repos, we already share the same base image and it's straight forward to verify this dynamically if necessary, though presumably would be included in the tag of the dev build.

We can definitely just duplicate the source based builds you have here, effectively creating a fork of this little part of the larger gdal repo, and try to keep it in sync with your work. I just thought it would be proper to touch base with you first to see if it made more sense for such an image to be here where it could more easily be discovered by the many other developers who build software stacks on top of gdal in this manner.

cboettig avatar May 04 '24 18:05 cboettig

@cboettig If you want to pursue this and enhance our Docker building script(s), I have no opposition in you doing so. But that might be complicated by our use of docker buildx for the release cross-architecture builds. I see that in 48d92c1a8737edde484f098bf09ff9ce48d08642 I had to do an optimization to avoid the release images to take too long to build , since as far as I remember, the previous way of doing 2 separate commands to do the builder stage and the runner stage didn't result in the image of the builder stage to be reused when using docker buildx. Maybe the --cache-to / --cache-from options could be used, but it seems there are issues, at least for some type of caches, as mentioned in https://github.com/docker/buildx/issues/1044 Perhaps @awill1988 who introduced multi-architecture support in https://github.com/OSGeo/gdal/pull/2965 has still interest in this and might help

That said if buildx mode became an annoyance, I guess we could probably use regular build for release images, in a similar way as done in 86d38d9961140227dbf5088aef7e8b3f76a2ad1b and 5da47524759 for the non-release incremental images, but not relying on a GCC ccache to be sure to have "clean" builds for releases. But that would involve more subtantial changes in the build scripts.

rouault avatar May 05 '24 18:05 rouault