rethinkdb-dockerfiles
rethinkdb-dockerfiles copied to clipboard
New image planning
This is a tracking issue to collect all the elements that are under consideration for a potential overhaul to the official RethinkDB image:
- Packaging from scratch (#43) or Alpine (#32) instead of Debian
- Packaging for ARM (#41)
- Running as a non-root user (#39)
- Removing the
VOLUMEdirective (#14) - Housekeeping
Now, some of these have a larger potential for breaking changes than others (particularly removing the VOLUME directive), so I'm not necessarily going to do all of them immediately; however, many of these (like changing the user and packaging from scratch) shouldn't affect the daemon's normal operation, so I may tie them to the next minor bump for the official library. (Ideally, all this would be held off as potentially breaking changes that would only accompany a major version bump, but I don't see RethinkDB 3.0 happening any time in the foreseeable future.)
Can i help for the alpine container ?
Is this still something that's planned? Is RethinkDB something that's still maintained? Would it make sense to look at RebirthDB? (https://github.com/RebirthDB/rebirthdb)
@nicodmf Yes, talk to the RebirthDB people.
@tianon All the activity is with RebirthDB (except for a small amount of stuff I do to fix stuff, and the miscellaneous PR's you see, and that'll end up in RebirthDB anyway) but it looks like they're going to become the people in charge of RethinkDB real soon. You'll notice that the main rebirthers are now owners of the RethinkDB org: https://github.com/orgs/rethinkdb/people
I like @daveisfera's work in https://github.com/rethinkdb/rethinkdb-dockerfiles/pull/40#issuecomment-391075583 and currently that's probably the closest thing to what a new image would look like, but that is a hairy RUN line - I'd rather see something that uses more of upstream's build process, which (IIRC) uses Nix and was developed by @AtnNn.
In any case, what we really need is a new pull request implementing a build on Alpine using either of these two approaches (if we can't get this going from scratch, which is what I'd really rather have) - I can't merge a thread comment.
RUN commands to build a package usually are big and ugly. Multi-stage builds help clean that up, but they're currently not supported for official images.
The Dockerfile that I made uses the same build process as the Alpine package and is very close to the version used for the Debian builds (i.e. just modifications to work with Alpine), so I'm not sure what the changes you're requesting would be.
Is there a way we could pull the build command by reference from an upstream source, like how Arch Linux has the ABS? In other words, if this is how the Alpine package is built, can we perform the build and install it as our own package?
I'm not following. A Dockerfile is basically a build system like ABS. If you go dig into the details of an ABS build, they're just as big and ugly. They have better ways to break it up and structure things so it's easier to look at and manage, but once multi-stage builds are supported, then the Dockerfile can be cleaned up in the same sort of way.
I guess what I'm saying is, if this does the some thing as https://git.alpinelinux.org/aports/tree/community/rethinkdb/APKBUILD, why not just build from that aports tree source directly (which, from the looks of it, would also resolve #39 via https://git.alpinelinux.org/aports/tree/community/rethinkdb/rethinkdb.pre-install)? Why repeat ourselves?
Oh, do you mean to run the APKBUILD file from Alpine inside of the Dockerfile? If that's what you're asking, then I'm not sure what that would take but I'm guessing it's possible. The down side is that we'd be tied to the version that they have installed and if we're going to couple it to their packaging like that, then why not just install the already built binaries?
if we're going to couple it to their packaging like that, then why not just install the already built binaries?
I guess my original thinking behind this was that this was the only way to really pin a package release, since Alpine's repos only provide the latest version. The point of the Docker library seems to be about making the build process as deterministic as possible, though I'll admit that this seems like something of a farce when the first line of the Dockerfile bases the entire build on a mutable tag.
Argh... ultimately, what this comes down to is that RethinkDB did their own packaging for Debian/Ubuntu/CentOS, and they packaged the full matrix of releases to package versions, and if we're relying on distro packaging (which only caters to the latest version), the ability to maintain images for previous versions using only upstream-provided infrastructure falls apart.
I don't want the Dockerfiles repository to become a backports project, especially when RethinkDB upstream already maintains a backporting CI system for other distributions.
I feel like the real solution would be to get Alpine added to https://github.com/rethinkdb/rethinkdb-nix, but that would require Alpine to get added to Nix's whole VM building system first...
Agh, whatever, we don't list anything but the latest version in the library anyway (and I vaguely recall the exchange that led to this).
I could argue that it's a bad idea to let old images' dependencies stagnate, but there's also an argument to be made for not making untested changes to obscure backports... whatever.
Yeah, at this point I'm fine just using Alpine's package. When Alpine drops support for older versions, that's fine, because so does our building of them. Anybody who wants to keep an older image building will just have to invent their own CI system to support their specific blend of old and new, I guess!
Or, really, reflecting on https://github.com/rethinkdb/rethinkdb-dockerfiles/pull/45#issuecomment-426690121, I think what'd be more appropriate than Ubuntu or Alpine going forward would be for the RethinkDB image to be based on a package for Nix: https://github.com/rethinkdb/rethinkdb-nix/issues/2
Having a set of packages for each of the OSes is really nice (like what Postgres provides), but it's a ton of work. Also, even they don't maintain packages for Alpine and they just build the software in the Alpine base image, like I've done. Basically, they have two versions:
- debian base that installs the prebuilt package ( https://github.com/docker-library/postgres/blob/85aadc08c347cd20f199902c4b8b4f736341c3b8/9.6/Dockerfile )
- alpine base that builds dynamically ( https://github.com/docker-library/postgres/blob/85aadc08c347cd20f199902c4b8b4f736341c3b8/9.6/alpine/Dockerfile )
I would vote that that be the same strategy that Rethink uses for the flexibility and control that it provides, because honestly being tied to whatever Alpine has limits the Docker images in a way that I believe will likely make them less useful than they should be.
@stuartpb @daveisfera I was surprised this morning when I ran a docker pull command in my project and the rethinkdb:2.3.6 image downloaded a new image from dockerhub.
The dockerhub page says the tag was updated 18 days ago.
The Dockerfile in this repo hasn't been changed for 2 years.
Given this: https://www.bankinfosecurity.com/docker-hub-breach-its-numbers-its-reach-a-12425, I'm wondering if there is cause for concern, or have one of you guys recently pushed a new image over that tag (and if so why wouldn't it just be a new tag?)
The base image was updated ( https://github.com/debuerreotype/docker-debian-artifacts/commits/fd138cb56a6a6a4fd9cb30c2acce9e8d9cccd28a/jessie/Dockerfile ). This is common to get security fixes and such out to all of the images and I don't believe that it has anything to do with the breach.
Are you guys any close to having AArch64 (ARM64) support on DockerHub?
https://hub.docker.com/_/rethinkdb/
Just to be clear, it's been ~623 days since the last actual rethinkdb image update, so it's definitely ripe (even if just to get off Debian Jessie whose leftover lifespan is getting very, very thin).
Without some amount of image maintenance, we'll be adding a deprecation notice (which itself can be temporary, but we'd really much rather see the image updated :smile: :heart:).
@tianon I have a PR that will update the docker images, namely https://github.com/rethinkdb/rethinkdb-dockerfiles/pull/46.
When I open the PR to change https://github.com/docker-library/official-images/blob/master/library/rethinkdb, should I open one to remove the deprecation warning or will you do it?
For rethinkdb itself, it was AGPLv3 and after the transition, was changed to APL. The docker page for the official rethinkdb image is AGPLv3.
Is there specific information as to what is still forcing the APGL license for the image?
My team wants to use it, but due to our Legal department, AGPL is not a viable license for us. Thanks for all your hard work and for the information you can provide!
@brecko the license should be Apache 2.0. Any other licenses are just wrong. Thank you for raising this, I’ll update that info.
@gabor-boros Thank you for the quick response and that is good news to hear. So that my team can better align, approximately when do you think this change will be made?
@brecko I wanted to do that today morning but I had no time for that. Tomorrow morning I’ll try to adjust that 😇
PR: https://github.com/docker-library/docs/pull/1679
Thank you @gabor-boros !
I came up with a recipe for building rethinkdb 2.4.2 on apline:3.15 which is a stable branch. It uses ARG directives that can be overridden from the build command and performs a multi-stage build across several layers. The final image is a minuscule 35.7MB (13.82MB compressed) and runs as a non-root user. Hopefully this meets the requirements to be used as the basis for an official rethinkdb:alpine image. For now, I've pushed this to docker hub under my account if anyone wants to try it out.
docker run -d --name=rdbtest -p 8080:8080 -p 28015:28015 -p 29015:29015 besworks/rethinkdb:latest
docker logs -f rdbtest
Or build it yourself from the latest/Dockerfile in my github repo.
I've also added a version that includes python + the python rethinkdb driver (~79MB) which is tagged as besworks/rethinkdb:python
So it gets the backtrace() with libexecinfo? Great.
I think instead of using boost-dev it should use the fetched boost library. This hard-codes the boost version to 1.60.0, which avoids any hypothetical changes to the behavior of boost's datetime library, which the query language and secondary index functions can use.
I think you also don't need icu-dev anymore.
From my build log it looks like it ignored the installed boost-dev package (1.77.0-r1) anyway and instead built with the fetched version that you mentioned.
None of the build dependencies end up in the final image anyway so having an extra one here and there won't hurt a whole lot. These can easily be fine-tuned in later builds. I'll try without icu-dev next time I need to do a full run.
For whatever it's worth, here's a Dockerfile that I made a while back that builds against alpine:
https://github.com/rethinkdb/rethinkdb-dockerfiles/issues/32#issuecomment-297428635
@daveisfera I based my Dockerfile partly on your example as well as others that I found. The difference with mine is that I use several RUN commands to create cached layers that way I can re-run builds with various tweaks without needing to install the deps, download and unpack the source, etc on each run.
I also build the output image from a fresh copy of the alpine base image to completely discard any build artifacts.
The python tagged image actually builds from a different base too because using python:alpine came out ~20MB smaller than installing python3 from apk. This could probably be reduced even more if the rethinkdb python module was installed some way other than with pip.