sig-storage-local-static-provisioner Support for restarting main sync loop on ConfigMap change

What type of PR is this? /kind feature

What this PR does / why we need it:

This change adds support for restarting the main sync loop when a config change has been detected on disk from change in the ConfigMap.

The implementation adds a ConfigWatcher go routine which will check the if the config has been updated by loading from disk and comparing with what is currently used. If change is detected, it will signal to the main sync loop to restart with the updated config. This includes restarting the informer and the job controller if the configuration specifies use Job to clean block devices. This change will enhance the deployment and rollout story for the provisioner as updates to the ConfigMap will be picked up automatically by the provisioner without needing to explicitly restart the pod itself.

Signed-off By: Yibo Zhuang [email protected]

Which issue(s) this PR fixes:

Fixes #302

Special notes for your reviewer:

Release note:

Support for watching ConfigMap changes and restarting main sync loop including informer and job controller (if specified)

Sep 29 '21 01:09 yibozhuang

Welcome @yibozhuang!

It looks like this is your first PR to kubernetes-sigs/sig-storage-local-static-provisioner 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/sig-storage-local-static-provisioner has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

Sep 29 '21 01:09 k8s-ci-robot

Hi @yibozhuang. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 29 '21 01:09 k8s-ci-robot

/ok-to-test

Sep 30 '21 15:09 ddebroy

/assign @wongma7

Oct 01 '21 15:10 yibozhuang

What fields of the config should be considered mutable for a runtime update?

In this particular PR any change to the config will trigger a restart of the controller. The current deployment story is that if the ConfigMap is updated, the running pod for the static provisioner will not be able to pick up this updated config until it is restarted or there is an image or spec change update.

One user story I had was that we needed to set the fsType config for a particular storage class and there was no image version bump needed but because there is no hot-reload support, we needed to manually kill the pod and recreate in order to pick up this config change.

I'm happy to further drill down specific fields that we'd want to have the runtime restart support if the direction becomes "we want this but would need to be very specific for the fields supported". My original intent of this change was to provide an option to allow for config changes without the need to recreate the pods.

If there are updates in the discovery directories, what happens with the existing PVs, are they cleaned up? What happens with PVs that are bound to workloads that are no longer accessible because the discovery directory changed?

Any existing PVs will not be cleaned up or changed. The provisioner will just watch on a different directory when the controller restarts. It would be up to the operator/user to handle the clean up of the existing PVs. This would be the same behavior today without this change. For e.g. if I want to change the discovery directory for the ConfigMap, then I would need a migration process for the existing PVs before restarting the provisioner to pick up the new config.

My understanding is that the previous directories would still be kept around and needed to be cleaned up just like on a config change today without this change.

What happens with PVs that are bound to workloads that are no longer accessible because the discovery directory changed?

This is not in scope for this change. Even without this change, if I have a config change for the discovery directory and manually restart the provisioner, I would still need to worry about the migration story.

Consider adding an e2e test, consider the cases above

I would be more than happy to add the e2e test once we agree on the scope of the change being "restart controller on any config changes" or "only allow this for a limited number of fields".

It'd be great to start a discussion over the questions above before implementing more changes

Sounds great!

Feb 17 '22 17:02 yibozhuang

Any existing PVs will not be cleaned up or changed. The provisioner will just watch on a different directory when the controller restarts. It would be up to the operator/user to handle the clean up of the existing PVs. This would be the same behavior today without this change.

I understand, I remember someone raised a similar issue in https://kubernetes.slack.com/archives/C09QZFCE5/p1637288978057900, I think we should understand and list the side effects of doing a hotreload first. One possible step to do if the container is requested to be terminated with SIGTERM or if LVP detects a config change:

go over the list of cached PVs references
- if the PV is unbounded (cleanup the PV contents, remove the PV), I think this is not implemented.
- if the PV is bounded (leave the PV as it is, I think that if LVP is restarted the PV would join the cache and its contents wouldn't be cleaned up), I think that this is the default behavior for unbounded/bounded PVs i.e. do nothing on SIGTERM.

Let's say that you change the fstype field in the config, if the above isn't also run then the existing PVs would continue to exist as it is, as far as I know LVP isn't going to update the fstype of the PVs that are already created, I haven't tried this scenario but you'd still need to manually remove the PVs, LVP would rediscover them in the discovery directory and create a PV with the updated fstype.

if I have a config change for the discovery directory and manually restart the provisioner, I would still need to worry about the migration story.

True, I think that at least you'd have time to think about the manual cleanup somewhere in between stopping LVP, changing the config and starting LVP again, I'm a concerned about what would happen if this is automated.

Feb 19 '22 05:02 mauriciopoppe

One possible step to do if the container is requested to be terminated with SIGTERM or if LVP detects a config change:

go over the list of cached PVs references

if the PV is unbounded (cleanup the PV contents, remove the PV), I think this is not implemented.

if the PV is bounded (leave the PV as it is, I think that if LVP is restarted the PV would join the cache and its contents wouldn't be cleaned up), I think that this is the default behavior for unbounded/bounded PVs i.e. do nothing on SIGTERM.

Correct currently the LVP will not touch existing PVs on a restart with a config change. I guess I'm not sure why we would want to have the clean up of unbounded PV be something that LVP should handle on SIGTERM or config change? The existing behavior of not doing anything makes more sense.

Also, for CSI external provisioner, there is no such automation as well and the CSI vendor will need to ensure that changes are backwards compatible or a manual clean up of existing provisioned PV will need to be done.

Let's say that you change the fstype field in the config, if the above isn't also run then the existing PVs would continue to exist as it is, as far as I know LVP isn't going to update the fstype of the PVs that are already created, I haven't tried this scenario but you'd still need to manually remove the PVs, LVP would rediscover them in the discovery directory and create a PV with the updated fstype.

Correct and I think we want to continue having LVP behave this way.

if I have a config change for the discovery directory and manually restart the provisioner, I would still need to worry about the migration story.

True, I think that at least you'd have time to think about the manual cleanup somewhere in between stopping LVP, changing the config and starting LVP again, I'm a concerned about what would happen if this is automated.

I think in this case the config update would still be controlled by the user, this change would only handle the restart. So users would still have time to think about the manual clean up and when ready, then proceed to update the config.

I'd be happy to discuss in Slack as well or have a short sync with you to make sure we address all the current concerns. Thanks!

Feb 23 '22 19:02 yibozhuang

Thanks for your replies and sorry for the delayed response, I forgot about this thread

Also, for CSI external provisioner, there is no such automation as well and the CSI vendor will need to ensure that changes are backwards compatible or a manual clean up of existing provisioned PV will need to be done.

Got it, thanks for checking what other sidecars do

I'd be happy to discuss in Slack as well or have a short sync with you to make sure we address all the current concerns. Thanks!

Sure, that sounds good to me, I still think that we should list the possible scenarios (e.g. a matrix of: config map field changes, PV bound/unbound, outcome) in a doc that can be shared with other people that can give more feedback, this matrix can be later used in the docs for this new feature too.

Mar 01 '22 20:03 mauriciopoppe

/test pull-sig-storage-local-static-provisioner-e2e

Mar 23 '22 21:03 mauriciopoppe

Sure, that sounds good to me, I still think that we should list the possible scenarios (e.g. a matrix of: config map field changes, PV bound/unbound, outcome) in a doc that can be shared with other people that can give more feedback, this matrix can be later used in the docs for this new feature too.

Hi @mauriciopoppe apologies for not following up, been busy with some other priorities.

Thanks a lot for your suggestions. I have updated the faqs.md, provisioner.md, and upgrading.md under docs highlighting this feature change.

Also filed #302 tracking this.

In provisioner.md I created a section

Updating configuration without restarting provisioner

showing a matrix of each field in the configuration and its effect for existing PVs and PVs to be provisioned (i.e. new).

Can you give it another review and let me know if there is any other info that is needed and/or if you have any feedback?

Thanks again!

Apr 08 '22 19:04 yibozhuang

Adding an item for this in https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/pull/300

Apr 12 '22 20:04 mauriciopoppe

IIUC, this PR only restarts the sync loop and not the entire process right? Can you update the PR description, release note, and documentation to make sure that's correct?

Apr 21 '22 16:04 msau42

New changes are detected. LGTM label has been removed.

Apr 21 '22 18:04 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yibozhuang To complete the pull request process, please ask for approval from msau42 after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Apr 21 '22 18:04 k8s-ci-robot

IIUC, this PR only restarts the sync loop and not the entire process right? Can you update the PR description, release note, and documentation to make sure that's correct?

@msau42 Thanks for the feedback!

Updated issue description, PR description, release note, and the documentation indicating that this will restart the main sync loop including the informer and job controller (if specified using Job for block device clean up).

Apr 21 '22 18:04 yibozhuang

Question by @msau42, are we going to have problems with the cached state?

In the implementation there are a few threads, M is the main thread, LC is the local controller thread (spawned from M), JC is the job controller thread (spawned from LC), CW is the config watch thread (spawned from M).

As soon as CW detects a config update it sends the new config through a signal sent to LC through configUpdate, LC will stop its spawned controllers (JC) before restarting itself with the new config.

I was trying to analyze the cached state in LC and JC, the loop in LC is:

	for {
		select {
		case stopped := <-signal.closing:
			close(informerStopChan)
			if jobController != nil {
				close(jobControllerStopChan)
			}
			stopped <- struct{}{}
			klog.Info("Controller stopped\n")
			return
		default:
			deleter.DeletePVs()
			discoverer.DiscoverLocalVolumes()
			time.Sleep(discoveryPeriod)
		}
	}

LC will select between stopping itself or running deleter.DeletePVs, deleter.DeletePVs is sync with respect to the data structures it owns and if a signal is attempted to be sent while deleter.DeletePVs is running then the caller will just block until LC gets to the next run of the select where the signal will be received, I think we're safe in the deleter state.

JC does this on its run loop:

func (c *jobController) Run(stopCh <-chan struct{}) {

	// make sure the work queue is shutdown which will trigger workers to end
	defer c.queue.ShutDown()

	klog.Infof("Starting Job controller")
	defer klog.Infof("Shutting down Job controller")

	// runWorker will loop until "something bad" happens.  The .Until will
	// then rekick the worker after one second
	wait.Until(c.runWorker, time.Second, stopCh)
}

func (c *jobController) runWorker() {
	for c.processNextItem() {
	}
}

if c.processNextItem() is running while a signal is received in stopCh it'll make the caller block, wait.Until decides whether to either accept an incoming signal or run runWorker so it'll exit in the next loop of wait.Until (https://github.com/kubernetes/apimachinery/blob/v0.23.6/pkg/util/wait/wait.go#L140), when JC is restarted we'll run func (c *jobController) Run again because of the previous call to c.queue.ShutDown() I think that new queue items wouldn't be added? https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/dab622fe70/vendor/k8s.io/client-go/util/workqueue/queue.go#L123 I'd check that @yibozhuang and probably move the queue creation to Run

Other local state includes an informer for jobs that's kept across calls to func (c *jobController) Run, that should be ok to be kept across JC restarts.

Apr 28 '22 17:04 mauriciopoppe

when JC is restarted we'll run func (c *jobController) Run again because of the previous call to c.queue.ShutDown() I think that new queue items wouldn't be added? https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/dab622fe70/vendor/k8s.io/client-go/util/workqueue/queue.go#L123 I'd check that @yibozhuang and probably move the queue creation to Run

Just so I understand: we are worried that when JC restarts, the previous items that were in the controller queue would be lost? Am I understanding the above statement correctly?

If so, then I don't think it's an issue because the entire LC is going to be restarted (i.e. StartLocalController() is going to be called again after receiving the updated config) so that would call

        var jobController deleter.JobController
	if runtimeConfig.UseJobForCleaning {
		labels := map[string]string{common.NodeNameLabel: config.Node.Name}
		jobController, err = deleter.NewJobController(labels, runtimeConfig)
		if err != nil {
			klog.Fatalf("Error initializing jobController: %v", err)
		}
		klog.Infof("Enabling Jobs based cleaning.")
	}

which would recreate the job queue and the informer and once the informer cache sync, it should trigger an event for every single Job resource and that should allow any previous queued work to reconcile appropriately.

My understanding here is that JC behaves similar to a regular controller which should be able to handle graceful and un-graceful restarts. Similar to today even without this change, which is if I were to delete the pod and recreate then the JobController can reconcile the resources after restarting.

@mauriciopoppe @msau42 please let me know if I mis-understood your concerns above. Thanks.

Apr 29 '22 00:04 yibozhuang

@yibozhuang got it, my concern was about jobController instance which I though it was created only once and whose Run method was called many times, thanks for the explanation, I see that a new instance is created on every call to StartLocalController and therefore we'd have a new queue too.

@msau42 is there any other concerns you have?

May 03 '22 18:05 mauriciopoppe

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 01 '22 18:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 31 '22 18:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Sep 30 '22 19:09 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 30 '22 19:09 k8s-ci-robot

/reopen /remove-lifecycle rotten

Oct 28 '22 23:10 mauriciopoppe

@mauriciopoppe: Reopened this PR.

In response to this:

/reopen /remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 28 '22 23:10 k8s-ci-robot

/lgtm /approve

Nov 07 '22 17:11 mauriciopoppe

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mauriciopoppe, yibozhuang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mauriciopoppe]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Nov 07 '22 17:11 k8s-ci-robot

sig-storage-local-static-provisioner sig-storage-local-static-provisioner copied to clipboard

Support for restarting main sync loop on ConfigMap change

sig-storage-local-static-provisioner
sig-storage-local-static-provisioner copied to clipboard