k3d [Feature] Create/Restore Cluster Snapshots

Scope of your request

Be able to create snapshots for complex clusters and restore them at will

I think this is very useful for clusters with stateful sets that take long to be created, in my case my Local Kafka + Zookeeper cluster takes around 10 minutes to be fully configured and populated, but I only need to do that once every couple months.

Describe the solution you'd like

This project is extremely helpful, I opted to use it instead of plain k3s because I saw a possibility to use docker commit as a snapshot tool, so I could iterate fast.
In case I break something I don't care too much about, I could just restart from that snapshot I commited and start adding my bugs to my code base again, very fast.

If it was a k3d native command it would be perfect but docker is fine for now

Describe alternatives you've considered

I tried and succeeded in creating the snapshot from a working k3d cluster with

docker commit -m "snapshot" "$(docker ps --filter name=k3d-k3s-local-server -q)" rancher/k3s:v0.10.0-snapshot

After that I run

k3d delete -a

and

docker run 53cb9ed4ec58

but I fail to restore my cluster to the initial state.

I can create a PR for this later but I am in need of some guidance on what needs to be done for this kind of approach to succeed.

In the beginning this docker commit and docker run approach would already be very useful if it worked.

The current error I see when starting a single server cluster with no agents is

Failed to get the info of the filesystem with mountpoint "/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs": unable to find data in memory cache.

So I am missing some mount point, I am just not sure what I need to manually recreate related to this https://github.com/rancher/k3s/issues/495, I guess k3d delete is removing this mount

Jan 02 '20 14:01 cfontes

Hi there, thanks for opening this issue. This is surely an interesting feature to have :+1: I'm not sure, how to proceed to get this working to be honest. The mountpoint that you're missing there is a subdirectory of one of the volumes created within the k3s Dockerfile (see https://github.com/rancher/k3s/blob/master/package/Dockerfile).

I'd be happy to review any pull-request from your side and will have another look into this issue once I have some more time :+1:

Jan 03 '20 07:01 iwilltry42

Ok I will do my best, let's see what happens.

Jan 13 '20 10:01 cfontes

Is there any progress on this, or something similar? I would also be interested in the functionality. If not, I would be interested in giving it a try as well, though I could not get to it in the next 2-3 weeks.

Jan 29 '21 13:01 arkocal

Pĺease do, I have too much on my plate right now ( 2 jobs since Dec/2020) , so I unfortunatly I couldn't do anything.

Jan 29 '21 13:01 cfontes

I just had a few more thoughts on this and now here are some things to note:

for simple single-server clusters (at least without agents), it's enough to do
```
docker volume create k3d-test
k3d cluster create k3d-test -v k3d-test:/var/lib/rancher/k3s
# ... do something with the cluster ...
k3d cluster delete k3d-test
k3d cluster create k3d-test -v k3d-test:/var/lib/rancher/k3s
```
to have the same state as before. This also works with docker cp'ing the contents of that directory and then copying it into place or bind-mounting the directory.
- Problem: if you change the cluster name when running the new cluster, it will show the containers as running, but they're assigned to the original node name and the original node will also show up in kubectl get nodes, making the pods inaccessible i.e. via kubectl exec. All pods then have to be re-created (e.g. kubectl delete pods -A --all).
in a multi-server cluster, one has to have exactly the same IP Range again for the new nodes as one had for the old nodes, as etcd internally uses the node IPs as identifiers, otherwise the new cluster created with the backed up files, will break

Jul 21 '21 12:07 iwilltry42

I will give it a try!, that would be enough for me since k3s is our local env and only has one node.

Jul 21 '21 13:07 cfontes

@cfontes , did you have any success so far? I moved this to the backlog now instead of just moving it from milestone to milestone.. :thinking:

Dec 20 '21 12:12 iwilltry42

@iwilltry42 I executed your proposal of single cluster, but when creating the cluster again, k3d complains as follows :

WARN[0002] warning: encountered fatal log from node k3d-kassio-server-0 (retrying 0/10): �time="2023-10-29T18:40:47Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

at least in version:

k3d version v5.6.0
k3s version v1.27.4-k3s1 (default)

On the other hand, I am trying by simply doing a snapshot of the server container and using it as the image for creating the new cluster (--image option). However, it seems to ignore what's inside and boots an empty k3s cluster.

Oct 29 '23 18:10 masantiago

k3d k3d copied to clipboard

[Feature] Create/Restore Cluster Snapshots

k3d
k3d copied to clipboard