nifikop icon indicating copy to clipboard operation
nifikop copied to clipboard

[BUG] nifikop fails to scale down the nifi cluster when it misses the chance to set proper gracefulActionState

Open srteam2020 opened this issue 2 years ago • 0 comments

Bug Report

Similar to https://github.com/konpyutaika/nifikop/issues/79, nifikop might fail to scale down the nificluster if it misses the chance to set gracefulActionState to GracefulUpscaleSucceeded for the nifi node to be deleted.

More concretely, we find that the GracefulActionState.State (in the nificluster cr) for each nifi node (pod) typically goes through the following changes:

  1. it is set to GracefulUpscaleRequired inside reconcileNifiPod()
  2. it is set to GracefulUpscaleRunning inside handlePodAddCCTask()
  3. it is set to GracefulUpscaleSucceeded inside reconcileNifiPod() when the nifi pod is ready

Both 1 and 3 happens insidereconcileNifiPod(), which is only invoked for each node in the Spec.Nodes as shown below:

	for _, node := range r.NifiCluster.Spec.Nodes {
		...
		o = r.pod(node.Id, nodeConfig, pvcs, log)
		err = r.reconcileNifiPod(log, o.(*corev1.Pod))
		if err != nil {
			return err
		}
	}

Suppose a user first creates a nificluster with 2 nodes then scales down to 1 node. If the user updates the nificluster cr to remove the last nifi node from Spec.Nodes between step 2 and step 3, GracefulActionState.State of the last nifi pod will never be set to GracefulUpscaleSucceeded and remains as GracefulUpscaleRunning.

Since the GracefulActionState.State of the nifi node is GracefulUpscaleRunning, it will never be added to nodesPendingGracefulDownscale by reconcileNifiPodDelete, and the scale down will never happen.

What did you do? Scale down nifi cluster

What did you expect to see? The last nifi pod should be deleted successfully.

What did you see instead? Under which circumstances? The last nifi pod cannot be deleted and the scale down never happens.

Environment

  • nifikop version: 1546e0242107bf2f2c1256db50f47c79956dd1c6
  • go version: go1.13.9 linux/amd64
  • Kubernetes version information: v1.18.9

Possible Solution Maybe consider invoking reconcileNifiPod() for each currently running nifi pod even if it is not in Spec.Nodes right now

Additional context We are willing to help fix the bug. The bug is automatically found by our tool Sieve: https://github.com/sieve-project/sieve

srteam2020 avatar Apr 18 '22 16:04 srteam2020