dragonfly-operator icon indicating copy to clipboard operation
dragonfly-operator copied to clipboard

feat(hpa): Add horizontal pod autoscaling for DragonFly instances

Open smunukutla-mycarrier opened this issue 5 months ago • 11 comments

This pull request resolves #320. It adds autoscaling support to Dragonfly operator, allowing Horizontal Pod Autoscaler to be used with Dragonfly instances.

Autoscaling (HPA) support

  • Added AutoscalerSpec to DragonflySpec in dragonfly_types.go, allowing users to configure HPA settings such as enabling autoscaling, min/max replicas, target CPU/memory utilization, scaling behavior, and metrics.
  • Implemented logic in dragonfly_instance.go to create, update, or delete HPA resources based on the AutoscalerSpec, including cleanup of HPA when autoscaling is disabled and preservation of replica counts during transitions.
  • Updated RBAC rules and controller setup to manage HPA resources, including new permissions in role.yaml and controller ownership of HorizontalPodAutoscaler objects.

API and code generation

  • Added imports and deepcopy methods for HPA types in dragonfly_types.go and zz_generated.deepcopy.go to support the new autoscaler configuration.

Documentation and examples

  • Updated README.md to advertise HPA support as a main feature.
  • Added a sample manifest v1alpha1_dragonfly_autoscaler.yaml demonstrating how to configure autoscaling for a Dragonfly instance.

E2E tests and minor improvements

  • Improved secret cleanup in e2e tests and added ImagePullPolicy to several test cases for consistency.
  • Minor log formatting improvement in cmd/main.go.

List of E2E tests for autoscaler

  • Should create Dragonfly instance with autoscaler enabled
  • Should create HPA resource
  • Should create StatefulSet with correct initial replica count
  • Should wait for all pods to be ready and have correct roles
  • Should preserve HPA-modified replica count
  • Should configure new pod as replica
  • Should handle master failover when scaled
  • Should handle replica deletion and recreation
  • Should handle HPA scaling down to minimum replicas
  • Should handle HPA scaling up to maximum replicas
  • Should preserve HPA scaling during operator reconciliation
  • Should handle rapid scaling events
  • Should update HPA when autoscaler spec changes
  • Should disable autoscaler and remove HPA
  • Should support custom metrics configuration
  • Should handle multiple concurrent pod deletions

smunukutla-mycarrier avatar Aug 14 '25 21:08 smunukutla-mycarrier

Hi @smunukutla-mycarrier, thanks for the PR! can you please resolve the conflicts?

Abhra303 avatar Aug 18 '25 06:08 Abhra303

@Abhra303 done. Thanks!

smunukutla-mycarrier avatar Aug 18 '25 12:08 smunukutla-mycarrier

@Abhra303 I would really appreciate it if we could merge this soon. We're looking forward to rolling out autoscaling for dragonfly. Please let me know if there any issues/concerns. Thanks! :)

smunukutla-mycarrier avatar Aug 22 '25 14:08 smunukutla-mycarrier

@smunukutla-mycarrier, when do you think we can merge this PR? Where I am working, this is a feature we want to implement.

ldiego73 avatar Aug 28 '25 14:08 ldiego73

@ldiego73 I'm not a maintainer on this repo. I'm also waiting for a review from the maintainers, when they get a chance.

smunukutla-mycarrier avatar Aug 29 '25 02:08 smunukutla-mycarrier

bumping this, please merge.

myc-jhicks avatar Sep 02 '25 15:09 myc-jhicks

Adding my support for this feature. Please merge.

bcarlock-mycarrier avatar Sep 02 '25 16:09 bcarlock-mycarrier

Hi @smunukutla-mycarrier, the test is not fixed yet. Also can you elaborate more about the reason for supporting HPA? Dragonfly doesn't support multiple masters. So this can only scale reads. Is this the reason you want to support hpa for?

Abhra303 avatar Sep 10 '25 07:09 Abhra303

@Abhra303 thanks for the feedback. Yes, we need to be able to (horizontally) scale secondary nodes automatically based on traffic/utilization. I'll push a fix for the tests this weekend, hopefully.

smunukutla-mycarrier avatar Sep 12 '25 02:09 smunukutla-mycarrier

@Abhra303 I’ve pushed a fix for the issue: CustomResourceDefinition "dragonflies.dragonflydb.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes. The root cause was an overly long description in the CRD. I’ve set crd:maxDescLen to 0 to prevent hitting the annotation length limit again, especially since it may grow further over time as we keep adding features.

I also merged latest from main branch and resolved conflicts. Please run the workflow again when you have some time.

smunukutla-mycarrier avatar Oct 01 '25 16:10 smunukutla-mycarrier

When is this being merged? It will be a great addition. Autoscaling is a great way to have peace of mind. I don't want to be worried about needing someone scaling othe pods manually when traffic increases.

Mwogi avatar Oct 07 '25 21:10 Mwogi