Remove the dependency with Docker CLI.
A clear description of the change.
Change the code to use the Docker API directly, so it doesn't require additional binaries to exist in the container.
Change the Dockerfile to build the application statically, allowing to build it with a Scratch image, making it distroless. That way, you don't have to worry about updating versions of the base image, because there is nothing else except the docuum binary.
The bulk of the changes are in src/run.rs, but GitHub hides it because it's "too large". Make sure you review that file in detail.
Status: Ready
Unfortunately we tried this in the past, shipped it, and subsequently had to revert it: https://github.com/stepchowfun/docuum/pull/100
So I'm concerned that we'll run into issues again with bollard. Even though I like using the Docker API in theory (especially for performance reasons), the CLI that everyone uses is more likely to work than a binding for the API, which must be kept in sync with the Docker API and presents extra opportunities for failures.
That's interesting. We've been using bollard to receive events from Docker in a different system for a long time and we've never seen any issue dropping events.
Regarding this comment:
I actually trust the Docker CLI to be more stable than the API, even though the API is versioned and the CLI is not.
The docker CLI is just a client for the API. There is no extra capabilities that the CLI has that the API doesn't expose. I would expect the endpoints that you need in the API to not change much. The problems that you ran in the past were probably caused by Bollard's maturity. I know this, because I built the docker API and CLI a decade ago, and they've not changed much in the basic functionalities ever since.
I'm going to fix the issue from CI. It's ok if you don't want to maintain this, I understand that's a big change. We're already using our own fork of Docuum, we can keep using it.
We're already using our own fork of Docuum, we can keep using it.
If you don't mind me asking, what are the other differences between your fork and upstream?
I know this, because I built the docker API and CLI a decade ago
Amazing 🙇
If you don't mind me asking, what are the other differences between your fork and upstream?
The only reason why we have a fork is because we have some strict compliance expectations regarding container images, and we need to update the base image often. We're moving to distroless images everywhere we can so we don't have to worry about base images as much, hence this change.
Thanks for explaining!
There's another thing I want to keep in mind about this change: some people are using Docuum with Podman instead of Docker. Unfortunately the most recent Docuum release broke Podman compatibility, though I'm hoping to fix that. I see Podman has an API compatibility layer which emulates the Docker API, with this disclaimer:
Our preference is that people writing new code to interact with Podman should use the libpod layer only. This is a more sound long term strategy. But for people that need to migrate to Podman, the compatibility layer allows for a quick on-boarding. There are of course Docker endpoints we cannot or choose not to emulate due to incompatibities between Docker and Podman.
I'm not sure if that compatibility layer will work without any issues with Bollard and the specific endpoints we need. Ideally we'd be testing Podman compatibility in CI, but for now we just try our best to make it work with no official guarantees.
So this is how I'm thinking about it:
Pros of merging this PR:
- Docuum would no longer depend on the Docker binary being available, allowing for more minimal Docker images.
- Docuum would presumably be faster since it wouldn't have to spawn processes. This is what we observed last time we tried it.
Cons of merging this PR:
- We had issues with Bollard in the past. I don't know if all the issues have been resolved or that we wouldn't encounter new ones.
- Bollard might stop being maintained some day.
- Potentially makes Podman compatibility harder (unconfirmed)
Maybe pro (1) could be addressed by building a Docuum image without the Docker binary, but bind mounting the Docker binary into the container's file system?
Another potential option would be to give Docuum an option to switch between the Docker CLI or the API, based on some configuration/flag. But the maintenance burden of supporting both might be too high.
Full transparency: based on the pros/cons as I see them, I'm leaning toward not merging the PR (but I'm grateful for it nonetheless).