vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

pdb syncing is broken

Open kfox1111 opened this issue 2 years ago • 8 comments

What happened?

Sort description: Some operators such as the percona one, utilize the maxUnavilable field of the pdb. Unfortunately, the pdb controller in the undercluster can't work with that field at present as it doesn't have enough info to make it work.

More details: We get a SyncFailed with error message “services does not implement the scale subresource” from the pdb controller in the undercluster.

Basically, pods are getting synced down to the undercluster with a service as the owner. If the pdb uses maxUnavailable with any value, or minAvailable with a percentage set, the pdb controller will look up the owner of the pods and figure out how many there should be so it can calculate an absolute maxUnavailable. It can't do this because that info isn't synced down from the overcluster.

Some of that logic is described here: https://kubernetes.io/docs/tasks/run-application/configure-pdb/#arbitrary-controllers-and-selectors

What did you expect to happen?

pdb's work correctly

I think, since only the overcluster knows the real values of the requested replicas, vcluster will need to convert pdb's using maxUnavailable or percent based minAvailable to absolute minAvailable when syncing to the undercluster.

The undercluster should then have enough knowledge on how to properly protect the pods from evictions.

How can we reproduce it (as minimally and precisely as possible)?

create a deployment in the overcluster create a pdb in the overcluster setting maxUnavailable

do a status check on the pdb in the undercluster

Anything else we need to know?

No response

Host cluster Kubernetes version

$ kubectl version
# paste output here

Host cluster Kubernetes distribution

# Write here

vlcuster version

$ vcluster --version
# paste output here

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

# Write here

OS and Arch

OS: 
Arch:

kfox1111 avatar Jan 10 '23 20:01 kfox1111

@kfox1111 thanks for creating this issue! We'll take a look at this pretty soon.

FabianKramm avatar Jan 17 '23 09:01 FabianKramm

any updates?

kfox1111 avatar Mar 16 '23 20:03 kfox1111

Hi, this made it to the top of my work queue, but was replaced by a slightly more urgent task. Will be working on this in the coming weeks :)

rohantmp avatar Mar 20 '23 10:03 rohantmp

Any progress on this issue?

FCosta999 avatar Sep 18 '23 13:09 FCosta999

Any progress on this issue?

Ah-Khai avatar Jan 31 '24 09:01 Ah-Khai

Hi, we've talked about a couple of approaches, but I haven't managed to get to the implementation yet!

rohantmp avatar Jan 31 '24 10:01 rohantmp

@rohantmp I've noticed, that minAvailable also doesn't work with the same error if we use the percentage instead of integer values (e.g. 50% instead of 3)

PavelGloba avatar Apr 08 '24 11:04 PavelGloba

Also I found out, that if I have two or more similar PDB's in different namespaces inside the vcluster, then after sync they are exactly the same in terms of the selector and the namespace if I check it from the main cluster (we don't use the multi-namespace mode). This leads to eviction errors like this one:

error when evicting pods/"test-v4-5449cf6559-rbf5m-x-test-service-x-test" -n "vc-test": This pod has more than one PodDisruptionBudget, which the eviction subresource does not support.

PavelGloba avatar Apr 10 '24 11:04 PavelGloba