fast-data-dev
fast-data-dev copied to clipboard
How to persist data
Hey there,
we're currently using your project for development. Is there an easy way (since we're new to Kafka as well as docker) to persist our topics as well as the connectors?
@jreBoAG Kafka Connect stores all it's configuration, offsets and statuses in Kafka, there are system topics that hold this, connect-offsets, connect-status and connect-configs, these topic names are set in the connect-avro-distributed.properties
file. In the Docker you will lose your data in kafka unless you mount a volume or point Connect to another Kafka Cluster that has persistence.
Hello, although we do not explicitly use docker volumes, there are two ways to persist data.
The first is to persist your docker container. For example, you could start fast-data-dev like this:
docker run -it -p 3030:3030 --name mykafka landoop/fast-data-dev
Once you finish working, press CTRL+C to stop the container. The container isn't deleted, just stopped. You can start it once again via:
docker start -ai mykafka
You could also set the container to run in the background, replacing -it
switch with -d
. In that case you would stop it with docker stop
.
The second option, if you want your data to persist across containers, you may use an external directory to store kafka and zookeeper files. We store them under /tmp
so you would need to mount a volume at this path:
docker run --rm -it -v /path/to/local/directory:/tmp landoop/fast-data-dev
Now if you stop (and remove) this container and start a new one providing the same volume, it should start from where the previous one left.
One catch is that the volume (/path/to/local/directory
in the example) should be writeable by all (chmod 0777
) as Kafka and Zookeeper run as user nobody
.
@andmarios, I am running a container with docker run --rm -it
-p 2181:2181 -p 3030:3030 -p 8081:8081
-p 8082:8082 -p 8083:8083 -p 9092:9092
-e ADV_HOST=127.0.0.1
landoop/fast-data-dev
While the container is running, I produce and consume several messages to/from a topic that I've created with success. You said that data from kafka and zookeeper are stored in /tmp. Could you present us the complete path of /tmp, because I've haven't found any subdirectories and files under /tmp directory inside the docker container:
docker run --rm -it --net=host landoop/fast-data-dev bash
root@fast-data-dev / $ cd tmp root@fast-data-dev tmp $ ls root@fast-data-dev tmp $ pwd /tmp root@fast-data-dev tmp $
The directories are created by the Kafka broker and zookeeper on startup. The way you ran the image, you skipped starting these services.
The broker stores its data under /tmp/kafka-logs
.
Zookeeper stores its data under /tmp/zookeeper
.
Try to go inside a normal running container to see them. E.g, start fast-data-dev:
docker run --rm -it --net=host --name=fdd landoop/fast-data-dev
Then from a second terminal:
docker exec -it fdd bash
Thank you for your replay @andmarios. But, sorry, I think I couldn't explain my question with enough details. I will try again. I did what you suggested me, but didn't works for me.
I will explain step by step what I've been doing:
-
Run a landoop/fast-data-dev docker container (after the execution, I didn't kill this terminal tab):
docker run --rm -it -p 2181:2181 -p 3030:3030 -p 8081:8081 -p 8082:8082 -p 8083:8083 -p 9092:9092 -e ADV_HOST=127.0.0.1 landoop/fast-data-dev
-
In another terminal tab, I executed (after the execution, I didn't kill this terminal tab too):
docker run --rm -it --net=host landoop/fast-data-dev bash
Inside the container at root@fast-data-dev /, I created a topic and produced some messages to it with success. -
At this moment, in another terminal tab, I executed:
docker run --rm -it --net=host landoop/fast-data-dev bash
Inside the container at root@fast-data-dev /, I did a ls command in /tmp. This directory was empty. I expected to seekafka-logs
andzookeeper
directories.
I checked the configuration on /opt/confluent-3.3.0/etc/kafka
:
In server.properties, I saw log.dirs=/tmp/kafka-logs
In zookeeper.properties, I saw dataDir=/tmp/zookeeper
You have to familiarize with docker a bit more. Every time you do docker run
you create a
new docker container, think of it as a new VM. If you run Kafka on one VM, you wouldn't expect to see its data in another VM, right?
The proper way to run your example would be:
- At this stage you indeed have to create a new container that runs Kafka. Please notice the
--name=fdd
parameter:docker run --rm -it -p 2181:2181 -p 3030:3030 -p 8081:8081 -p 8082:8082 -p 8083:8083 -p 9092:9092 -e ADV_HOST=127.0.0.1 --name=fdd landoop/fast-data-dev
- Now you don't have to create a new container, you can go into the one running Kafka:
docker exec -it fdd bash
- Same as before, you need to connect to the container running Kafka:
docker exec -it fdd ls /tmp
Hope this helps!
@andmarios, sorry about that. I'm newbie in Docker and I now I realize what you tried to tell me before. Of course, when I execute the command docker run
, actually I create another container.
No worries, we all passed from this stage (and still learning and making mistakes). :)
Hey @andmarios, does your answer from 2017-09-12 still hold true? I see, that the Dockerfile currently created a volue to the /data
folder. If I fill up some topics and want to back this state as a starting point, so that I can revert later, what should I do? I tried mouting it via -v /path/to/my/local/folder:/data
, but it didn't work.
now, it's "/data"