argo-rollouts icon indicating copy to clipboard operation
argo-rollouts copied to clipboard

HTTP Endpoint Migration

Open aweis89 opened this issue 1 year ago • 2 comments

Summary

The objective is to enable controlled transitions of specific HTTP endpoint(s) from one service (legacy) to another service (new), both implementing the same HTTP API. This proposal aims to leverage Argo Rollouts' Analysis, Experiments, Rollbacks, and Traffic Routing features. The primary focus is to use the Argo Rollouts framework for traffic shifting, enabling migrations to be conducted safely and incrementally—either in a canary-like fashion or all at once, in a blue/green manner.

Unlike a typical argo rollout, this proposal is tailored to HTTP endpoint migration scenarios where traditional Pod lifecycle management, such as pod creation and termination, is not the goal. Both the new and legacy services must be managed by separate Rollouts or Deployments, as they are not intended to succeed one another once the migration is complete. Only the specified HTTP endpoints (as designated in the VirtualService or TrafficSplit) will be fully transitioned to the new service, receiving 100% of the traffic weight.

Use Cases

When transitioning from a monolithic to a microservices architecture, it becomes essential to incrementally move HTTP endpoints from the legacy service to the new service. Both services are expected to implement the same HTTP API. The ability to perform analyses and experiments ensures a controlled transition with automated rollback capabilities.

Open questions

API and Controller Design: A new Custom Resource (CR) could be used to implement this functionality. The new CR would have its own spec and controller logic, focusing solely on traffic routing and associated analyses, without managing Pod lifecycles. Alternatively, this can be overloaded into the existing Rollout API by adding a field like disablePodLifecycleManagement.

Setting replicas: The implementation and API of this might need to change in order to take into account that both services are being managed by separate Deployments/Rollouts and possibly separate HPAs as well. For example, we might need a reference to the Deployment/Rollout and/or its HPA in-order to manipulate replicas.

Other than that, I think much of API could remain the same:

apiVersion: argoproj.io/v1alpha1
kind: HttpEndpointRollout
metadata:
  name: my-http-endpoint-rollout
spec:
  newEndpointScalingRef: # used for setting replica count
    kind: Rollout  # perhaps could also be Deployment or HPA
    name: new-service-rollout
  strategy:
    canary:
      steps:
      - setWeight: 1
      - pause: {duration: 1h}
      - setWeight: 30
      - pause: {duration: 3h}
      - setWeight: 100
       # keep background analysis running for 1 week and rollback in the event of failure during that period
      - pause: {duration: 1w}
  analysis:
    templates:
    - templateName: success-rate
  trafficRouting:
    virtualService:  # contains the routing rules that this would be limited to and where weights would be applied
      name: my-traffic-split
    trafficSplit:  # or for linkerd/SMI
      name: my-traffic-split


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

aweis89 avatar Sep 17 '23 20:09 aweis89

https://github.com/argoproj/argo-rollouts/issues/2186 and https://github.com/argoproj/argo-rollouts/issues/2779 and https://cloud-native.slack.com/archives/C01U781DW2E/p1689755530884959

zachaller avatar Sep 19 '23 18:09 zachaller

This issue is stale because it has been open 60 days with no activity.

github-actions[bot] avatar Nov 19 '23 02:11 github-actions[bot]