jena icon indicating copy to clipboard operation
jena copied to clipboard

Publish a docker image up-to-date using a github actions workflow

Open vemonet opened this issue 10 months ago • 3 comments

Version

Latest

Feature

Hi @afs, currently there are 256 images for fuseki on dockerhub: https://hub.docker.com/search?q=fuseki

All of them are outdated, or adding some unwanted quirks on top of Fuseki. Which shows a real need and demand for an officially published fuseki docker image.

Fortunately Fuseki already has an official docker image. But it is not published automatically with each release

Instead, right now in 2025, a user would need to:

  1. Go to the documentation page https://jena.apache.org/documentation/fuseki2/fuseki-docker.html
  2. Attentively read the documentation to find the link to the download server
  3. Navigate the folders to find the zip file containing the Dockerfile https://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-docker/
  4. Download this zip file
  5. Unzip the zip file
  6. Go into this folder with the terminal to run docker build

Which is a bit complex when it could be really easily reduced to:

docker run -it -p 3030:3030 ghcr.io/apache/fuseki:5.4.0

Also when you want to use fuseki inside a workflow for testing purpose, you don't want to have to write a 20 lines bash script just to download and build the docker image

All that is needed is to add a github actions workflow to build and publish the docker image on every new release. This work needs to be done once, and will be fully automatic after that. Also it will enable to test the docker build and make sure it does not break in future releases.

I would recommend to use ghcr (GitHub Container Registry) over dockerhub because:

  • it keeps everything in one place instead of requiring also a dockerhub account. You can just use the Github token available in the github action workflow
  • for now ghcr.io does not have rate limits (dockerhub has limits which makes it problematic on some clusters)

But that is obviously up to you

Writing a workflow to build and publish a docker image is quite easy nowadays, but if you wish I can help you with that

Are you interested in contributing a solution yourself?

Yes

vemonet avatar May 02 '25 15:05 vemonet

GitHub Container Registry very much does have rate limits - https://github.com/aquasecurity/trivy-action/issues/389 - though you have to publish an extremely popular image to hit them

No reason not to publish to multiple registries including DockerHub, though the project would likely need to talk to Infra to figure out the AuthN for that.

GHCR is probably easier option in short term

rvesse avatar May 02 '25 17:05 rvesse

Thanks @rvesse , I did not know about these limitations for ghcr. My guess is that every single public container registry will have some kind of limitations to prevent abuse of their system

Known limitations:

  • GitHub Container Registry: 700 request per minute (global apparently)
  • DockerHub: 100 pulls every 6h per IP address (https://docs.docker.com/docker-hub/usage/)

Our experience with DockerHub is that you can easily reach their limit when deploying on a shared kubernetes cluster, requiring you to do additional work either to switch the IP address or login to DockerHub to raise the limitations

But Dockerhub has an open source program you can apply to that will lift rate limits: https://www.docker.com/community/open-source/application/

No reason not to publish to multiple registries including DockerHub, though the project would likely need to talk to Infra to figure out the AuthN for that. GHCR is probably easier option in short term

Agree with that. More can be better, but simplicity and reliability also has value when maintaining systems on the long run

I am not knowledgeable on Apache policies, but from a quick search it seems like there are no requirements on which registry should be used. The most commonly used is apache/ org on DockerHub (e.g. flink, kafka, probably because it was the first public registry), but some projects are using ghcr.io (e.g. yetus)

On our side for now we just want to have 1 image, ideally available through a container registry that does not requires login

vemonet avatar May 05 '25 10:05 vemonet

There has been work recently on Fuseki to have a full function server - with UI, with admin functions - so that is something with stable functionality for a container.

There are different uses:

  1. Fuseki with UI.
  2. Fuseki server, no UI.
  3. Downstream UI customization.
  4. Use in testing - presumable no UI.
  5. As a base layer for downstream custom servers.

(are there others?).

Fuseki-with-UI has it's own implications - it needs admin access control (user/password).

To make security solid, "no UI" might need to be different - don't ship the UI code, configure so the administration functions are robustly disabled.

For the project, I think we need to consider the provenance chain from project to released container. There is how it's integrated into the release process so that the PMC is voting on the container(s). That could be voting on the scripts/actions to produce the container or voting on the container binary somehow. We can look at those other Apache projects and see what is best practice.

The process should be reproducible - users may wish to build a container themselves, whether to avoid public global container registries or to verify container. (Docker builds are not byte-for-byte reproducible as I understand it.)

afs avatar May 07 '25 15:05 afs