kind icon indicating copy to clipboard operation
kind copied to clipboard

clean up orphaned loop devices: use unique kubelet path

Open pohly opened this issue 4 years ago • 6 comments

When a container running under KIND binds a file to a loop device and then is terminated, the file remains bound even when removing the entire KIND cluster. This is a problem in particular for Prow because those leaked loop devices and the associated resources may accumulate over time.

There's no good fix because it's impossible to look at a bound loop device and determine whether it is still needed. All that one has is the file name, which is the same inside the original container and outside (no namespacing or anything).

What would you like to be added:

Here's a workaround for Prow:

  • add an option to KIND which changes the /var/lib/kubelet path so that it contains a unique ID chosen by the caller
  • use that for Kubernetes-CSI test jobs to ensure that kubelet and CSI drivers bind files whose full path name has a unique ID (like the Prow job ID)
  • add cleanup code somewhere (TBD) which unbinds all loop devices whose path contains that unique ID

Why is this needed:

This way we may be able to catch most tests that (theoretically) could leak loop devices.

pohly avatar Feb 13 '20 20:02 pohly

add an option to KIND which changes the /var/lib/kubelet path so that it contains a unique ID chosen by the caller

this is somewhat awkward as a knob. today it could be accomplished with a kubeadm config patch in the kind cluster config, I'll prototype something soon.

add cleanup code somewhere (TBD) which unbinds all loop devices whose path contains that unique ID

probably in the test-infra docker-in-docker logic

BenTheElder avatar Feb 14 '20 01:02 BenTheElder

today it could be accomplished with a kubeadm config patch in the kind cluster config, I'll prototype something soon.

That's fine. It's really a corner-case, so the solution doesn't have to be nice. I had looked at that briefly but it wasn't immediately obvious where that path might be changed, so an example would be good.

pohly avatar Feb 14 '20 08:02 pohly

another thought from twitter: https://lkml.org/lkml/2020/4/8/506 see this thread :thread: : https://twitter.com/filbranden/status/1249724120599691269

BenTheElder avatar Apr 16 '20 08:04 BenTheElder

/lifecycle frozen

BenTheElder avatar Jul 27 '20 19:07 BenTheElder

Did we agree to implement something which embeds a unique ID in a non-standard kubelet data directory? After the comment about loopfs I wasn't sure anymore.

As pointed out in https://github.com/kubernetes/kubernetes/issues/92664, the CSI tests must then be configured to use the modified data directory.

pohly avatar Dec 07 '20 20:12 pohly

Did we agree to implement something which embeds a unique ID in a non-standard kubelet data directory? After the comment about loopfs I wasn't sure anymore.

I think it's a bit of an awkward layering issue, but something we should probably still consider. It's technically possible to do already via kubeadm config patches.

BenTheElder avatar Dec 07 '20 21:12 BenTheElder