argo-rollouts
argo-rollouts copied to clipboard
HTTP Endpoint Migration
Summary
The objective is to enable controlled transitions of specific HTTP endpoint(s) from one service (legacy) to another service (new), both implementing the same HTTP API. This proposal aims to leverage Argo Rollouts' Analysis, Experiments, Rollbacks, and Traffic Routing features. The primary focus is to use the Argo Rollouts framework for traffic shifting, enabling migrations to be conducted safely and incrementally—either in a canary-like fashion or all at once, in a blue/green manner.
Unlike a typical argo rollout, this proposal is tailored to HTTP endpoint migration scenarios where traditional Pod lifecycle management, such as pod creation and termination, is not the goal. Both the new and legacy services must be managed by separate Rollouts or Deployments, as they are not intended to succeed one another once the migration is complete. Only the specified HTTP endpoints (as designated in the VirtualService or TrafficSplit) will be fully transitioned to the new service, receiving 100% of the traffic weight.
Use Cases
When transitioning from a monolithic to a microservices architecture, it becomes essential to incrementally move HTTP endpoints from the legacy service to the new service. Both services are expected to implement the same HTTP API. The ability to perform analyses and experiments ensures a controlled transition with automated rollback capabilities.
Open questions
API and Controller Design: A new Custom Resource (CR) could be used to implement this functionality. The new CR would have its own spec and controller logic, focusing solely on traffic routing and associated analyses, without managing Pod lifecycles. Alternatively, this can be overloaded into the existing Rollout API by adding a field like disablePodLifecycleManagement
.
Setting replicas: The implementation and API of this might need to change in order to take into account that both services are being managed by separate Deployments/Rollouts and possibly separate HPAs as well. For example, we might need a reference to the Deployment/Rollout and/or its HPA in-order to manipulate replicas.
Other than that, I think much of API could remain the same:
apiVersion: argoproj.io/v1alpha1
kind: HttpEndpointRollout
metadata:
name: my-http-endpoint-rollout
spec:
newEndpointScalingRef: # used for setting replica count
kind: Rollout # perhaps could also be Deployment or HPA
name: new-service-rollout
strategy:
canary:
steps:
- setWeight: 1
- pause: {duration: 1h}
- setWeight: 30
- pause: {duration: 3h}
- setWeight: 100
# keep background analysis running for 1 week and rollback in the event of failure during that period
- pause: {duration: 1w}
analysis:
templates:
- templateName: success-rate
trafficRouting:
virtualService: # contains the routing rules that this would be limited to and where weights would be applied
name: my-traffic-split
trafficSplit: # or for linkerd/SMI
name: my-traffic-split
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
https://github.com/argoproj/argo-rollouts/issues/2186 and https://github.com/argoproj/argo-rollouts/issues/2779 and https://cloud-native.slack.com/archives/C01U781DW2E/p1689755530884959
This issue is stale because it has been open 60 days with no activity.