dragonfly-operator icon indicating copy to clipboard operation
dragonfly-operator copied to clipboard

Statefulset update / recreate

Open applike-ss opened this issue 1 year ago • 5 comments

Due to config adjustments, the operator tries to patch the sts in a way that would be incompatible.

I am getting this error then:

StatefulSet.apps "drangonfly-app" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

It would be great if drangonfly-operator could delete the sts with cascade=orphan and then re-create it with the current config to ensure the desired state.

I was thinking that when i now remove this sts manually, the operator would re-create it to ensure the desired state. This was also not the case and i would like the operator to ensure the desired state of having the sts with the desired configuration re-created as well.

applike-ss avatar Mar 14 '24 07:03 applike-ss

Due to config adjustments, the operator tries to patch the sts in a way that would be incompatible.

Could you share the config that led to the following behaviour? Updating dragonfly CRD caused the issue? I am interested to know the root cause.

It would be great if drangonfly-operator could delete the sts with cascade=orphan and then re-create it with the current config to ensure the desired state.

Recreating the statefulset wouldn't solve the underlying issue (i.e. why is the operator trying to update statefulset like that).

I was thinking that when i now remove this sts manually, the operator would re-create it to ensure the desired state. This was also not the case and i would like the operator to ensure the desired state of having the sts with the desired configuration re-created as well.

Yep, its indeed nice to have.

Abhra303 avatar Mar 20 '24 13:03 Abhra303

I've the same problem. I deploy the CRD with ArgoCD, but the Operator does not update anything, no trigger for rollout replace. And after that, I can see un the logs :

/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235 2024-03-20T16:29:41Z ERROR Reconciler error {"controller": "dragonfly", "controllerGroup": "dragonflydb.io", "controllerKind": "Dragonfly", "Dragonfly": {"name":"dragonfly-test","namespace":"test"}, "namespace": "test", "name": "dragonfly-test", "reconcileID": "4261fe51-f541-437b-9a2e-cf6b64b253db", "error": "StatefulSet.apps \"dragonfly-test\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

SoGooDFR avatar Mar 20 '24 16:03 SoGooDFR

Could you share the config that led to the following behaviour? Updating dragonfly CRD caused the issue? I am interested to know the root cause.

It was indeed updating the CR. I was updating the spec.snapshot.persistentVolumeClaimSpec. This leads to an update in the sts' spec.persistentVolumeClaim path, which is usually not allowed. So my suggestion is to allow this change by removing the sts with cascade=false option and re-create it.

Recreating the statefulset wouldn't solve the underlying issue (i.e. why is the operator trying to update statefulset like that).

That is true, updating parts of the CR that are not supposed to be updated from sts side is.

Here's a demo CR:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-app
spec:
  image: ghcr.io/dragonflydb/dragonfly-weekly:e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu
  args:
    - '--cache_mode'
    - '--primary_port_http_enabled=true'
    - '--cluster_mode=emulated'
  snapshot:
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec:
      resources:
        requests:
          storage: 1Gi
      accessModes:
        - ReadWriteOnce
  resources:
    limits:
      cpu: 100m
      memory: 320Mi
    requests:
      cpu: 100m
      memory: 320Mi
  replicas: 3

Updating this to the following will show the issue:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-app
spec:
  image: ghcr.io/dragonflydb/dragonfly-weekly:e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu
  args:
    - '--cache_mode'
    - '--primary_port_http_enabled=true'
    - '--cluster_mode=emulated'
  snapshot:
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec:
      resources:
        requests:
          storage: 2Gi
      accessModes:
        - ReadWriteOnce
  resources:
    limits:
      cpu: 100m
      memory: 320Mi
    requests:
      cpu: 100m
      memory: 320Mi
  replicas: 3

applike-ss avatar Mar 21 '24 07:03 applike-ss

Would be awesome if you can make this one work @Abhra303 (https://github.com/dragonflydb/dragonfly-operator/pull/222)

applike-ss avatar Sep 18 '24 11:09 applike-ss

Yep, am busy with other stuff currently, will fix the PR soon!

Abhra303 avatar Sep 19 '24 04:09 Abhra303

Same for resurces. For example, if you change memory resources in CR yaml from 2G to 4GB of memory and apply the changes, the statefulset will not be updated with a new memory settings. You need to edit or patch sts manually.

clusters:                                                                                                                                                                
 - name: redis 
   resources:                                                                                                                                                           
      requests:                                                                                                                                                          
        memory: 4Gi                                                                                                                                                      
        cpu: 200m                                                                                                                                                        
      limits:                                                                                                                                                            
        memory: 4Gi 

jurim76 avatar Nov 28 '24 20:11 jurim76