kitodo-production icon indicating copy to clipboard operation
kitodo-production copied to clipboard

Containerization of Kitodo Docker / Kubernetes

Open marinhio75 opened this issue 1 year ago • 8 comments


name: Task for the development fund about: A working package which may be sponsored by the Kitodo e.V. development fund. labels: development fund 2025 assignees: ''


Description

Why should Kitodo be containerized? Containerizing Kitodo enhances flexibility, scalability, and maintenance by ensuring a consistent environment, simplifying deployments, enabling dynamic scaling, and facilitating seamless updates, especially in distributed or cloud-based environments.

Expected Benefits of this Development

Containerizing Kitodo offers numerous advantages in terms of flexibility, scalability, and maintenance. By using Docker or Kubernetes, Kitodo and all its dependencies can run in isolated containers, ensuring a consistent environment across different systems. This minimizes compatibility issues between operating systems and significantly simplifies deployments.

Additionally, containerization allows for easy scaling, as instances can be dynamically started or stopped based on demand. Updates and maintenance can be performed seamlessly by replacing individual containers without downtime. Especially in distributed or cloud-based environments, this approach simplifies operations and enables integration into modern DevOps workflows.

Estimated Costs and Complexity

Please try to estimate the costs and / or the complexity of the development.

high ~ more than 10 working days

Too much outdated software would have to be loaded into the container, everything that is not part of the package e.g. Tomcat 9, Java 11 etc.

marinhio75 avatar Feb 03 '25 09:02 marinhio75

Sounds interesting, I would like to ask: Is this really related to the development of the Software or is it more related to deployment/infrastructure? What would be the expected outcome? Docker configuration files and images? Kubernetes configuration files? In order to be useful those have to be updated when the underlying software changes. So that would be a continuous tasks.
We had a similar issue in the past: https://github.com/kitodo/kitodo-production/issues/4313

With the old issue in mind: Are we striving for something which is mostly used for development purposes or for production purposes? Edit: An evaluation from what is missing from existing images to achieve the respective goals would be interesting: https://github.com/slub/kitodo-production-docker/

BartChris avatar Feb 03 '25 11:02 BartChris

I'm not entirely sure how this issue here relates with mine from last year.

https://github.com/kitodo/kitodo-production/issues/5968

I think containerization is helpful but only for use in a stable testing/development environment. We should pursue this path for sure, but I'm not sure about the scope of the matter. Maybe we can link these issues together and make the purpose for the fund more clear?

Erikmitk avatar Mar 04 '25 07:03 Erikmitk

Summary

I would like to try to create a summary for all issues in context of this topic "KITODO.PRODUCTION in Docker". I refer to this issue here #6394, and also the older ones #4313 and #5968. Main idea of this summary is to collect all relevant information at one place (this issue here) with the aim to have a (hopefully final) discussion and to have something, which is a good definition to be topic for the Development Fond ("good" in this context means, it is in the end well defined enough to be put into the tender process).

General aim and structure

The aim for this dockerization of KITODO.Production was also discussed in the past. Here is the main content to be considered:

  • The result should be usable for "all" purposes, which are -- Have a basis for a development and test environment -- Have a basis for a "try-out" resp. "demo-system". -- Have a basis for a real production environment

--> Just to make it clear: All of this needs somebody with some IT know-how. At least somebody, who has access to a Linux machine and who can setup Docker Containers. This will include also future "try-out" or "demo" systems.

--> In the end the result shall minimize the barrier to entry into the usage of KITODO.PRODUCTION for all purposes. It is clear, that any kind of "real" environment based on this will have a customized setting. This is foreseeable and intended.

  • To reach this aim in my opinion the "slim-container-approach" should be used. This means one container runs only a single service. The sum of all needed containers are put into a "compose" definition. The solution created by SLUB is a good example (https://github.com/slub/kitodo-production-docker/) for this. In detail I see the following container structure (I have kept the names from SLUB example, to make it more clear): -- kitodo-app: Contains the KITODO.PRODUCION software (war-file) -- kitodo-db: Contains the database server (Maria-DB) -- kitodo-es: Contains the search engine (OpenSearch) -- kitodo-mq: Contains the Message-Queue-Service (ActiveMQ) -- kitodo-ldap: Contains the LDAP-Server (OpenLDAP) -- kitodo-storage: Contains Storage-Server (Samba).

Docker Image Hosting

As seen in SLUB-Example a dedicated hosting (e.g. on Docker-Hub) of the "kitodo-app" docker images is not needed. I would prefer to keep this concept, as in this case it is not needed to clarify which is the best "Docker-Hosting-Service". To do this a Dockerfile is needed (like in the SLUB example), which creates the kitodo-app image. This image creation can be done during Release creation and is stored on GitHub as all the other Release Files.

Kubernetes

At this moment I would NOT go additionally for Kubernetes as this would mean also to maintain Kubernetes configuration files. This means, we stay with just docker images/containers and use docker-compose to combine these.

Container-Modules

In this part some details per Container-Module are given.

kitodo-app

The main job here is to create an according dockerfile. Again, the SLUB-Example is a good starting point. It takes from Github a release and creates out of this the docker image. It is ok to put a "hard-coded" release version in this Dockerfile, as it can (and must) be changed with each release. Also, the dependent base modules need to be installed, like Java and Tomcat. These can also have "hard coded" versions (as it can (and maybe must) be changed with each release). The SLUB-Example uses "alpine" as base image. Which is ok, but maybe there are reasons also to use another one. It is also a good pattern like in SlUB-Example to use ENV variables. This container also needs to handle all configuration possibilities of the software. Which are directories "config", "messages", "rulesets", "xslt". Plus dedicated files like "kitodo_config.properties", "log4j2.xml". For this a "volume" concept is needed (how to map this files from the host to the container). The file "hibernate.cfg.xml" is a special case. As it mainly contains the credentials for the database access, which need to be configured. It would be good, if this could be done with Environment variables (like in SLUB example).

kitodo-db

The SLUB-Example has used this definition file (.env.example) to define this. Which looks like a good structure again. Here Maria-DB should be preferable used as default.

kitodo-es

Same logic as for kitodo-db. Just, "OpenSearch" should be used as default.

kitodo-mq:

In general same logic as before. In SLUB-Example an own active-mq-container image is used. I would prefer official https://hub.docker.com/r/apache/activemq-artemis image.

kitodo-ldap:

This is not (yet) existing in SlUB-Example. Nevertheless, I would be in favor to put this into standard container delivery package. This could be used: https://github.com/osixia/docker-openldap

kitodo-storage

The idea here is to have a container which provides the Samba service. This service is used by the KITODO-Software. I am not aware of a standard docker image providing Samba. So, maybe another dockerfile needs to be written. As this is just the service, the data point, where the data is stored in the end, must be provided as volume to the docker container. In a first step this could be just a (predefined) path on the docker host. Purpose for this storage is to have a place where to store the "project data", which are in KITODO.PRODUCTION the directories: "metadata" and "export".

How to bring this together

To bring all this together a docker compose file is needed. Again, SLUB-Example is a good starting point, as it uses in a good way environment variables (.env.example). Is addition, some data must be filled:

  • The database "kitodo-db" must be filled with intial data. This is for a fresh, empty installation the ".sql"-file from the Release. -- Plus some data which matches to the used LDAP-Server -- For a "demo-system", there will be even more data to be filled initially. So, from a concept point of view, it must be foreseen, that the database can be pre-filled with different (additional) data.
  • The LDAP-Service "kitodo-ldap" must be filled with initial data. Which includes beside general admin access, also the User-Accounts the KITODO-SW expects to have. -- for a "demo-system", there will be even more data to be filled initially (for additional users). So, from a concept point of view, it must be foreseen, that the LDAP can be pre-filled with different data.
  • For the "storage" there must a at least a clear documentation, what needs to be provided (e.g. a path on the dockerhost for the "kitodo-storage" container)

Demo-System

Details for a new demo system, based on this dockerized KITODO.PRODUCTION, will be described in a different issue. Here - as mentioned already above - "only" from conceptional point of view shall be considered, how to fill this base dockerized system with "demo" data.

Discussion opened

Please feel free to provide any comment.

@solth : Please consolidate for the DevFund-Meeting the 3 issues related to "Docker" to one topic to be discussed during this meeting.

stefanCCS avatar Mar 21 '25 12:03 stefanCCS

@stefanCCS If I understand this correclty consolidating the individual issues is more or less exactly what you have done, haven't you? We can remove the development fund 2025 label from #6394, #4313 and #5968, but since @stroetgen already compiled the list of issues for the fund I am not sure that should be changed now. We could add the other tickets as "sub issues" to this one, but I am not sure if that yields any significant advantage.

solth avatar Mar 26 '25 12:03 solth

@solth Yes, absolutely right: I have consolidated everything to this ticket here. For the "consolidation" from your side it was only the idea to remove the labels from https://github.com/kitodo/kitodo-production/issues/4313 and https://github.com/kitodo/kitodo-production/issues/5968 . (but keep it at this issue here (https://github.com/kitodo/kitodo-production/issues/6394) of course). Even, if @stroetgen already has compiled a list, I think it is ok, if you do this label removing. You (or myself) can simply explain this directly in the meeting.

stefanCCS avatar Mar 26 '25 13:03 stefanCCS

@stefanCCS should be correct now. Please let me know if I should add/remove another label in preparation for the meeting tomorrow.

solth avatar Mar 26 '25 14:03 solth

The container solution should not only work with Docker (which one, official Docker or the one from the Linux distribution?), but also with free alternatives, especially with Podman.

stweil avatar Mar 27 '25 08:03 stweil

2 votes

solth avatar Mar 27 '25 13:03 solth