java-operator-sdk icon indicating copy to clipboard operation
java-operator-sdk copied to clipboard

Update of immutable StatefulSet fields

Open Donnerbart opened this issue 1 year ago • 2 comments

We have the situation that we have to change the labels and selector match labels on a StatefulSet CRUD resource, that our operator manages. The problem is that the selector field is immutable and cannot be changed via a normal reconciliation. We also have to keep the deployed Pods alive, since the application holds data and is connected to possibly millions of clients.

Currently we restore the old labels/selectors in the update() method of the StatefulSet and our Service resources:

    @Override
    public @NotNull StatefulSet update(
            final @NotNull StatefulSet actual,
            final @NotNull StatefulSet target,
            final @NotNull HiveMQPlatform platform,
            final @NotNull Context<HiveMQPlatform> context) {
        final var logPrefix = platform.getLogPrefix();
        final var actualSpec = actual.getSpec();
        final var targetSpec = target.getSpec();
        // fix selector match labels
        final var actualSelectorMatchLabels = actualSpec.getSelector().getMatchLabels();
        final var targetSelectorMatchLabels = targetSpec.getSelector().getMatchLabels();
        if (!Objects.equals(actualSelectorMatchLabels, targetSelectorMatchLabels)) {
            LOG.debug("{} Restoring labels on StatefulSet", logPrefix);

            target.getMetadata().getLabels().putAll(actualSelectorMatchLabels);
            LOG.trace("{} Actual labels: {}", logPrefix, actual.getMetadata().getLabels());
            LOG.trace("{} Target labels: {}", logPrefix, target.getMetadata().getLabels());

            targetSpec.getTemplate().getMetadata().getLabels().putAll(actualSelectorMatchLabels);
            LOG.trace("{} Actual template labels: {}", logPrefix, actualSpec.getTemplate().getMetadata().getLabels());
            LOG.trace("{} Target template labels: {}", logPrefix, targetSpec.getTemplate().getMetadata().getLabels());

            targetSpec.getSelector().setMatchLabels(actualSelectorMatchLabels);
            LOG.trace("{} Actual selector match labels: {}", logPrefix, actualSpec.getSelector().getMatchLabels());
            LOG.trace("{} Target selector match labels: {}", logPrefix, targetSpec.getSelector().getMatchLabels());
        }

We have to disable SSA for the StatefulSet and Service resources (otherwise we'll trigger infinite updates, due to the SSA matcher and the label mismatch). But SSA anyway causes problems with the StatefulSet (if I have more details about that, I'll open another issue for that).

Beside that this solution works fine, as long we have an actual resource to compare the labels with. But when a new Service is created, it will of course have the new labels and selectors, and they don't match the (old) StatefulSet and Pods.

A possible solution would be to delete the StatefulSet with context.getClient().resource(statefulSet).withPropagationPolicy(DeletionPropagation.ORPHAN).delete();, fix the labels on running the Pods. Then the normal reconciliation will create a new StatefulSet and update the existing Services.

Our problem is that we don't know where to add this logic in the operator workflow. Ideally this should only be executed once, when the custom resource is initialized. We should only need the custom resource instance and the K8s client for this.

Donnerbart avatar Feb 15 '24 16:02 Donnerbart

Hi @Donnerbart , this does not seems to be a standard use case for sure :)

For the beginning some notes:

  • there are some problems with SSA in some cases in k8s it is ok no to use it
  • you can alway just override the matcher method and enhance / customize the implementation for your demands. Note that this is just an optimization, making an update all the time should also work

Regarding how to exactly do this is tougher, but for now just a hint, you can also override the whole reconciliation logic for a dependent resource, this is the top level function called:

https://github.com/operator-framework/java-operator-sdk/blob/69711cf346d75331a61616bcfe2fd7015c93434b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/dependent/DependentResource.java#L25-L25

So you can customize that as you want, even build your own class hierarchy etc. So basically you don't necessarily need to match this logic with the current structure (Creator, Updator, Matchers etc..), this was an intentional design decision exactly for these cases where it does not fit a standard model. Of course it is much harder than and more code, and study (also some aspects of workflows) but this make it doable.

csviri avatar Feb 16 '24 09:02 csviri

I gave it a try this morning and came up with this method, that I call at the beginning of our desired() method:

    private final @NotNull Set<String> labelAndSelectorCheckDone = ConcurrentHashMap.newKeySet();

    private void fixLabelsAndSelector(
            final @NotNull HiveMQPlatform platform,
            final @NotNull Context<HiveMQPlatform> context,
            final @NotNull Map<String, String> desiredLabels,
            final @NotNull Map<String, String> desiredSelectorLabel) {
        final var logPrefix = platform.getLogPrefix();
        final var platformIndex = platform.getMetadata().getNamespace() + "-" + platform.getMetadata().getName();
        if (!labelAndSelectorCheckDone.add(platformIndex)) {
            return;
        }
        final var statefulSet = context.getSecondaryResource(StatefulSet.class).orElse(null);
        if (statefulSet == null) {
            return;
        }
        final var statefulSetLabels = statefulSet.getMetadata().getLabels();
        final var statefulSetSelectorLabels = statefulSet.getSpec().getSelector().getMatchLabels();
        if (Objects.equals(statefulSetLabels, desiredLabels) &&
                Objects.equals(statefulSetSelectorLabels, desiredSelectorLabel)) {
            return;
        }
        LOG.info("{} Deleting StatefulSet with outdated labels or label selector", logPrefix);
        LOG.trace("{} Actual labels: {}", logPrefix, statefulSetLabels);
        LOG.trace("{} Target labels: {}", logPrefix, desiredLabels);
        LOG.trace("{} Actual selector match labels: {}", logPrefix, statefulSetSelectorLabels);
        LOG.trace("{} Target selector match labels: {}", logPrefix, desiredSelectorLabel);
        final var client = context.getClient();
        try {
            final var statusDetails = client.resource(statefulSet)
                    .withPropagationPolicy(DeletionPropagation.ORPHAN)
                    .withTimeoutInMillis(TimeUnit.SECONDS.toMillis(5000))
                    .delete();
            LOG.info("{} Deleted StatefulSet: {}", logPrefix, statusDetails);
        } catch (final Exception e) {
            labelAndSelectorCheckDone.remove(platformIndex);
            LOG.warn("{} Cannot not delete outdated StatefulSet: {}", logPrefix, e.getMessage());
        }
        // fix the Pod labels, so that reconciled Services can match the Pods right away
        try {
            client.pods().inNamespace(platform.getMetadata().getNamespace()).list().getItems().forEach(pod -> {
                pod.getMetadata().getLabels().putAll(desiredLabels);
                client.resource(pod).update();
            });
        } catch (final Exception e) {
            labelAndSelectorCheckDone.remove(platformIndex);
            LOG.warn("{} Cannot not update labels on Pods: {}", logPrefix, e.getMessage());
        }
        throw new IllegalStateException("StatefulSet needs to be re-created, stopping current reconciliation");
    }

So basically once on the first reconcile I'm checking if there is already a StatefulSet with the old/outdated selector, if so delete it, fix the pods and then interrupt the running reconciliation.

It's quite a hack and I hope my assumption that StatefulSetResource extends CRUDKubernetesDependentResource<StatefulSet, HiveMQPlatform> is a singleton is correct (regarding the scope of labelAndSelectorCheckDone).

Beside the huge stacktrace I didn't see any side effects so far (e.g. with the internal caches and informers etc.).

If this doesn't ring any alarm bells, this would have the lowest impact/effort for us, so we can stick with the default CRUD implementation as much as possible.

Donnerbart avatar Feb 16 '24 11:02 Donnerbart

I think it is fine, if it is working it's working. ( As I mentioned it is intentionally reconcile() and not just CRUD methods on top level interface. Some cases needs sepcial logic and that is fine. )

Will close the issue if no objections.

csviri avatar Feb 27 '24 08:02 csviri