official-images icon indicating copy to clipboard operation
official-images copied to clipboard

Add vespa as an official Docker image.

Open aressem opened this issue 2 years ago • 4 comments

aressem avatar Nov 29 '22 14:11 aressem

Doc PR in https://github.com/docker-library/docs/pull/2239

aressem avatar Nov 29 '22 14:11 aressem

Diff for 3294f48d7f31be49344be1596164c7156f8baf7c:
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..c07b590 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,7 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: Arnstein Ressem <[email protected]> (@aressem)
+GitRepo: https://github.com/vespa-engine/docker-image.git
+
+Tags: 8, latest
+Architectures: amd64, arm64v8
+GitCommit: acf37b961540ca3f474f2fe118cabeb6cd88a0d0
+Directory: official/8
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..e56f8f2 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,2 @@
+vespa:8
+vespa:latest
diff --git a/vespa_latest/Dockerfile b/vespa_latest/Dockerfile
new file mode 100644
index 0000000..9b1495d
--- /dev/null
+++ b/vespa_latest/Dockerfile
@@ -0,0 +1,5 @@
+# Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
+
+# The official Docker image is the current tested and released version of the upstream Vespa container image
+FROM docker.io/vespaengine/vespa-minimal:8
+

github-actions[bot] avatar Nov 29 '22 14:11 github-actions[bot]

From the website, I would've assumed that "Vespa" is a service or a development framework of some kind, but from the real Dockerfile I found (https://github.com/vespa-engine/docker-image/blob/acf37b961540ca3f474f2fe118cabeb6cd88a0d0/Dockerfile.minimal) it looks more like it's maybe a (semi-?) full RPM-based Linux distribution?

This will start up a container with Vespa running all the services neeed.

This also worries me a bit -- is this starting some kind of process manager?

It looks like it's maybe this? https://github.com/vespa-engine/docker-image/blob/acf37b961540ca3f474f2fe118cabeb6cd88a0d0/include/start-container.sh#L63-L64

Unfortunately, that's not going to be acceptable, as it's very complicated to track multiple processes in a container correctly without a full process manager, and most of the minimal process managers like supervisord, runit, s6, etc we've looked at have too many unpleasant quirks to be really usable as a container's PID 1 and have the expected "container semantics" like service output to stdout, responding correctly to docker stop, forwarded signals, etc.

(Sorry if this seems blunt -- I want to make sure to raise the red flags with respect to the Official Images program that I'm seeing right off the bat in case this isn't a fit so we don't expend too much effort together before we figure that out! :heart:)

tianon avatar Nov 30 '22 00:11 tianon

Thanks for the first quick feedback.

Vespa is a data-serving engine allowing users to store documents, search, rank and evaluate models. The closest comparable official image would be Elasticsearch (comparison in https://vespa.ai/vespa-elastic-solr). The image we submitted here can be used in two modes. Either as standalone with all the processes needed (default) or split into a configuration service and content/container nodes that can be run on an orchestrated container platform (e.g. Kubernetes).

It is not a Linux distribution, but our distribution mechanism is RPMs which are built on Fedora Copr. We have traditionally installed all the RPMs required and package management (dnf) inside the image, but the upstream image built by Dockerfile.minimal is built from scratch using a build container that use dnf/RPMs to install the software. We did this now because we could not derive from CentOS Stream 8 which is not on the official image list (according to the requirements). This can be changed to an acceptable solution based on further requirements.

Multiple processes are started inside the container and it is run as a service composed of several processes. The start-container.sh script you found is the entrypoint and would start the service depending on the mode. We have not observed issues by having /bin/bash as PID 1, but it is possible to either use the built in init process in Docker (docker --init) or we could use the same tiny process manager as other official images (like Elasticsearch) use. Or we could run a full /sbin/init inside, but that is more tricky.

When it comes to logging, this is currently not enabled by default to stdout. We have good logging capabilities inside the containers, which can be streamed to stdout.

aressem avatar Nov 30 '22 12:11 aressem

@tianon Any thoughts about the above answer ? We are willing to work with you to find a solution that is acceptable for both parties.

aressem avatar Dec 19 '22 08:12 aressem

@tianon Are we at a dead end here ? In the meantime we have been accepted as a Docker OSS Sponsored organization. Please let us know how to procceed or close this PR with a comment.

aressem avatar May 23 '23 09:05 aressem