karpenter-provider-aws
karpenter-provider-aws copied to clipboard
Multi region support in same cluster
Description
What problem are you trying to solve?
We are running custom Karpenter implementation with k3s
We would like to extend to have one Karpenter handling multi region support in a single Kubernetes cluster. I can see subnetSelector assumes the current region.
Would we need to deploy multiple Karpenters to handle this scenario?
It is an option to have two separate clusters but for the use case we're OK with geo extended.
We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.
Is there a way that you overcome that latency?
We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.
Is there a way that you overcome that latency?
@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.
Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?
We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.
Or will they trip over each other
Yeah, they'd definitely trip over each other. There's potentially a possibility that sometime deep in the future we might support the ability to run two at one time in the same cluster with some kind of global lock/lease hand-off mechanism, but right now, they are running with the assumption that they are a singleton in the cluster.
Thank you, @jonathan-innis. Appreciate the discussion and engagement.
Appreciate the discussion and engagement
No problem 👍 Definitely think that this could be a neat feature down-the-line, whether we support multi-region Karpenter natively or whether we allow multiple Karpenters to run in the same cluster
Jumping in here with the same request. Our use case involves running large batch inference jobs where AWS can run out of GPUs in a single region. We want to have our single cluster be able to provision nodes from multiple regions.
We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region. Is there a way that you overcome that latency?
@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.
Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?
We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.
@dcarrion87 Could you please provide information on how to configure a multi-region node group in an EKS cluster?
Multi-Region Support Karpenter Integration Design Proposal
Overview
This proposal presents a design that enables node provisioning across multiple AWS regions with a single Karpenter instance.
It dynamically detects regions from subnetSelectorTerms in EC2NodeClass, eliminating the need for environment variable configuration.
Current Support Status
Karpenter already supports the topology.kubernetes.io/region key and allows specifying multiple regions in NodePool requirements:
requirements:
- key: topology.kubernetes.io/region
operator: In
values: ["us-east-1", "ap-northeast-1"]
Proposed Design Approach
1. Dynamic Region Detection from EC2NodeClass
Leveraging the existing subnetSelectorTerms mechanism to automatically detect region lists from karpenter.sh/region tags.
2. Multi-Region Support with Single EC2NodeClass
Example of listing multiple regions in one EC2NodeClass:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: multi-region-nodeclass
spec:
amiFamily: AL2
subnetSelectorTerms:
# us-east-1 subnet selection
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "us-east-1"
environment: "production"
# ap-northeast-1 subnet selection
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "ap-northeast-1"
environment: "production"
# eu-west-1 subnet selection
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "eu-west-1"
environment: "production"
securityGroupSelectorTerms:
# Security groups for each region
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "us-east-1"
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "ap-northeast-1"
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "eu-west-1"
role: "KarpenterNodeRole-${CLUSTER_NAME}"
3. Region-Specific EC2NodeClass Configuration Example
When creating individual EC2NodeClass per region:
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: ap-northeast-1-nodeclass
spec:
amiFamily: AL2
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
karpenter.sh/region: "ap-northeast-1"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
role: "KarpenterNodeRole-${CLUSTER_NAME}"
Implementation Architecture
1. Region Detection Logic
The Karpenter controller dynamically detects regions through the following steps:
- Monitor all
EC2NodeClassresources - Aggregate
karpenter.sh/regiontag values fromsubnetSelectorTerms - Initialize EC2 API clients from the detected region list
2. Provisioning Flow
sequenceDiagram
participant KC as "Karpenter Controller"
participant NP as "NodePool"
participant EC2NC as "EC2NodeClass"
participant AWS as "AWS APIs (Multi-Region)"
participant SQS as "SQS Queues (Multi-Region)"
KC->>EC2NC: Dynamically detect region list
EC2NC-->>KC: Aggregate from karpenter.sh/region tags
KC->>KC: Initialize EC2 clients for each region
KC->>SQS: Start monitoring SQS queues for each region
KC->>NP: Check topology.kubernetes.io/region requirements
KC->>AWS: Create instance in appropriate region
AWS-->>KC: Return instance information
SQS-->>KC: Receive Spot interruption/maintenance events
3. Integration with NodePool
Specifying multiple regions in NodePool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: multi-region-nodepool
spec:
template:
metadata:
labels:
app: multi-region-app
spec:
requirements:
- key: topology.kubernetes.io/region
operator: In
values: ["us-east-1", "ap-northeast-1", "eu-west-1"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: multi-region-nodeclass
SQS Interruption Queue Support
Multi-Region SQS Configuration
Current Karpenter processes interruption events with a single SQS queue. Multi-region support requires individual SQS queues in each region.
Environment Variable Configuration Format
INTERRUPTION_QUEUES=us-east-1:my-cluster,ap-northeast-1:my-cluster,eu-west-1:my-cluster
Advantages of this format:
- Clear structure: Clear relationship between regions and queue names
- Simplified parsing: Two-step processing with comma split → colon split
- Backward compatibility: Traditional single queue processing when no colon is present
Interruption Controller Extension
Extend the current interruption controller to monitor SQS queues from multiple regions in parallel:
- Manage SQS API clients per region
- Process messages from each queue in parallel
- Message routing including region information
CloudFormation Configuration Example
Queue configuration in each region:
# us-east-1
KarpenterInterruptionQueue-us-east-1:
Type: AWS::SQS::Queue
Properties:
QueueName: my-cluster
MessageRetentionPeriod: 300
SqsManagedSseEnabled: true
# ap-northeast-1
KarpenterInterruptionQueue-ap-northeast-1:
Type: AWS::SQS::Queue
Properties:
QueueName: my-cluster
MessageRetentionPeriod: 300
SqsManagedSseEnabled: true
Technical Considerations
1. Network Configuration
- Proper tagging of VPCs, subnets, and security groups in each region
- Consider inter-region communication (if necessary)
2. IAM Permissions
- EC2 operation permissions across multiple regions
- Access permissions to SQS queues in each region
- Region-specific resource access permissions
3. Pricing Information Integration
Integrate pricing information from multiple regions to achieve cost-optimized instance selection
4. Interruption Handling
- EventBridge Rules configuration in each region (Spot Interruption, Maintenance Events)
- Mapping between region-specific instance IDs and NodeClaims
- Message processing concurrency and error handling
Benefits
- Simplified configuration: Declarative configuration only without environment variables (except SQS configuration)
- Dynamic scaling: Automatic region support expansion when adding new EC2NodeClass
- Reduced operational costs: Multiple region management with single Karpenter instance
- Improved availability: Automatic failover during regional failures
- Architectural consistency: Consistency with existing subnetSelectorTerms patterns
- Complete Spot support: Spot interruption event handling in each region
Constraints
- Network latency: Latency due to inter-region communication
- Configuration complexity: Managing configurations for multiple regions
- Debugging complexity: Difficulty in identifying root causes during failures
- SQS configuration complexity: Individual queue configuration required in each region
Instances such as GPU or Spot servers may not be available in a single region. Even though EKS Auto Mode is supported, adding new EKS clusters is still not efficient in terms of management and cost. If we could use that feature, it would be a good method for us to use EKS clusters more efficiently.