kapp-controller icon indicating copy to clipboard operation
kapp-controller copied to clipboard

kapp-controller leader election should provide safety during update operations

Open jdef opened this issue 2 years ago • 10 comments

Describe the problem/challenge you have

  • no way to configure leader election for kapp-controller

Describe the solution you'd like

  • we'd like a better guarantee for limiting the number of active kapp-controller reconcilers to 1 during rolling upgrades of our cluster

Anything else you would like to add: [Additional information that will assist in solving the issue.]


Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

jdef avatar Aug 22 '22 13:08 jdef

@jdef Thanks for filing this issue. Can you share more about how you're using kapp-controller? Right now there's not really a way to run multiple kapp-controllers at once

joe-kimmel-vmw avatar Aug 22 '22 15:08 joe-kimmel-vmw

we're just getting started with it. it looks to be running an apiservice (at least, there's an apiservice resource included in resources.yml) but there's only a single replica. our concerns:

(a) if it's in ANY kind of api/controlplane hot-path, it should be HA (for us, that translates to more than a single replica); which leads to ... (b) added insurance that no more than 1 replica is attempting to reconcile applications/packages (no split-brain, ever, please)

if i trusted Deployment more to guard against split-brain, maybe i'd feel better. i don't, because i've seen meltdowns from such assumptions. leader election has proven much more reliable along these lines. if we're going to run this in prod envs, we'd like it to meet our prod standards.

HTH

On Mon, Aug 22, 2022 at 11:16 AM Joe Kimmel @.***> wrote:

@jdef https://github.com/jdef Thanks for filing this issue. Can you share more about how you're using kapp-controller? Right now there's not really a way to run multiple kapp-controllers at once

— Reply to this email directly, view it on GitHub https://github.com/vmware-tanzu/carvel-kapp-controller/issues/838#issuecomment-1222503287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLD447FAH4JUTIVOO4DV2OKWXANCNFSM57HXYTOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James DeFelice

jdef avatar Aug 22 '22 21:08 jdef

Any other thoughts here from the KC team?

On Mon, Aug 22, 2022, 5:33 PM James DeFelice @.***> wrote:

we're just getting started with it. it looks to be running an apiservice (at least, there's an apiservice resource included in resources.yml) but there's only a single replica. our concerns:

(a) if it's in ANY kind of api/controlplane hot-path, it should be HA (for us, that translates to more than a single replica); which leads to ... (b) added insurance that no more than 1 replica is attempting to reconcile applications/packages (no split-brain, ever, please)

if i trusted Deployment more to guard against split-brain, maybe i'd feel better. i don't, because i've seen meltdowns from such assumptions. leader election has proven much more reliable along these lines. if we're going to run this in prod envs, we'd like it to meet our prod standards.

HTH

On Mon, Aug 22, 2022 at 11:16 AM Joe Kimmel @.***> wrote:

@jdef https://github.com/jdef Thanks for filing this issue. Can you share more about how you're using kapp-controller? Right now there's not really a way to run multiple kapp-controllers at once

— Reply to this email directly, view it on GitHub https://github.com/vmware-tanzu/carvel-kapp-controller/issues/838#issuecomment-1222503287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLD447FAH4JUTIVOO4DV2OKWXANCNFSM57HXYTOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James DeFelice

jdef avatar Sep 03 '22 14:09 jdef

hi @jdef - I just realized the maintainers had a sort of unfinished exchange that we hadn't percolated back out to you, sorry about that!

It seems like we'd be open to adding a leader/follower lease setup, similar to the k8s core controllers- would that satisfy your concerns? I think this is something we'd be happy to do eventually and/or accept contributions for.

joe-kimmel-vmw avatar Sep 03 '22 21:09 joe-kimmel-vmw

Yep that would work fine, thanks.

On Sat, Sep 3, 2022, 5:03 PM Joe Kimmel @.***> wrote:

hi @jdef https://github.com/jdef - I just realized the maintainers had a sort of unfinished exchange that we hadn't percolated back out to you, sorry about that!

It seems like we'd be open to adding a leader/follower lease setup, similar to the k8s core controllers- would that satisfy your concerns? I think this is something we'd be happy to do eventually and/or accept contributions for.

— Reply to this email directly, view it on GitHub https://github.com/vmware-tanzu/carvel-kapp-controller/issues/838#issuecomment-1236198124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLB2S7I3AHJEO5NZYHLV4O4J5ANCNFSM57HXYTOQ . You are receiving this because you were mentioned.Message ID: @.***>

jdef avatar Sep 04 '22 00:09 jdef

Hi @joe-kimmel-vmw, you mean using something like the leaderelection library from client-go, right? If that's the case, I'd be willing to work on this.

vicmarbev avatar Oct 05 '22 06:10 vicmarbev

That's what I had in mind

On Wed, Oct 5, 2022, 2:40 AM Víctor Martínez Bevià @.***> wrote:

Hi @joe-kimmel-vmw https://github.com/joe-kimmel-vmw, you mean using something like the leaderelection https://github.com/kubernetes/client-go/tree/master/tools/leaderelection library from client-go, right? If that's the case, I'd be willing to work on this.

— Reply to this email directly, view it on GitHub https://github.com/vmware-tanzu/carvel-kapp-controller/issues/838#issuecomment-1268010533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLFJKL7SGCN5ENQC7ITWBUPEFANCNFSM57HXYTOQ . You are receiving this because you were mentioned.Message ID: @.***>

jdef avatar Oct 05 '22 10:10 jdef

I will look into this assign me this issue!

basit9958 avatar Aug 01 '23 20:08 basit9958

/assign

basit9958 avatar Aug 01 '23 20:08 basit9958

@basit9958 Thank you for showing interest in working on the issue ❤️
I have assigned the issue to you. I would also recommend going through the previously closed PR and and the comments there.

praveenrewar avatar Aug 03 '23 07:08 praveenrewar