karpenter-provider-aws icon indicating copy to clipboard operation
karpenter-provider-aws copied to clipboard

Multi region support in same cluster

Open dcarrion87 opened this issue 2 years ago • 8 comments

Description

What problem are you trying to solve?

We are running custom Karpenter implementation with k3s

We would like to extend to have one Karpenter handling multi region support in a single Kubernetes cluster. I can see subnetSelector assumes the current region.

Would we need to deploy multiple Karpenters to handle this scenario?

It is an option to have two separate clusters but for the use case we're OK with geo extended.

dcarrion87 avatar Nov 06 '23 02:11 dcarrion87

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.

Is there a way that you overcome that latency?

jonathan-innis avatar Nov 06 '23 02:11 jonathan-innis

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.

Is there a way that you overcome that latency?

@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.

Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?

We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.

dcarrion87 avatar Nov 06 '23 03:11 dcarrion87

Or will they trip over each other

Yeah, they'd definitely trip over each other. There's potentially a possibility that sometime deep in the future we might support the ability to run two at one time in the same cluster with some kind of global lock/lease hand-off mechanism, but right now, they are running with the assumption that they are a singleton in the cluster.

jonathan-innis avatar Nov 06 '23 03:11 jonathan-innis

Thank you, @jonathan-innis. Appreciate the discussion and engagement.

dcarrion87 avatar Nov 06 '23 03:11 dcarrion87

Appreciate the discussion and engagement

No problem 👍 Definitely think that this could be a neat feature down-the-line, whether we support multi-region Karpenter natively or whether we allow multiple Karpenters to run in the same cluster

jonathan-innis avatar Nov 06 '23 16:11 jonathan-innis

Jumping in here with the same request. Our use case involves running large batch inference jobs where AWS can run out of GPUs in a single region. We want to have our single cluster be able to provision nodes from multiple regions.

montanaflynn avatar Jun 13 '24 14:06 montanaflynn

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region. Is there a way that you overcome that latency?

@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.

Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?

We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.

@dcarrion87 Could you please provide information on how to configure a multi-region node group in an EKS cluster?

FRABUCHI avatar Sep 11 '24 09:09 FRABUCHI

Multi-Region Support Karpenter Integration Design Proposal

Overview

This proposal presents a design that enables node provisioning across multiple AWS regions with a single Karpenter instance. It dynamically detects regions from subnetSelectorTerms in EC2NodeClass, eliminating the need for environment variable configuration.

Current Support Status

Karpenter already supports the topology.kubernetes.io/region key and allows specifying multiple regions in NodePool requirements:

requirements:
  - key: topology.kubernetes.io/region
    operator: In
    values: ["us-east-1", "ap-northeast-1"]

Proposed Design Approach

1. Dynamic Region Detection from EC2NodeClass

Leveraging the existing subnetSelectorTerms mechanism to automatically detect region lists from karpenter.sh/region tags.

2. Multi-Region Support with Single EC2NodeClass

Example of listing multiple regions in one EC2NodeClass:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: multi-region-nodeclass
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    # us-east-1 subnet selection
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "us-east-1"
        environment: "production"
    # ap-northeast-1 subnet selection
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "ap-northeast-1"
        environment: "production"
    # eu-west-1 subnet selection
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "eu-west-1"
        environment: "production"
  securityGroupSelectorTerms:
    # Security groups for each region
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "us-east-1"
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "ap-northeast-1"
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "eu-west-1"
  role: "KarpenterNodeRole-${CLUSTER_NAME}"

3. Region-Specific EC2NodeClass Configuration Example

When creating individual EC2NodeClass per region:

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: ap-northeast-1-nodeclass
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
        karpenter.sh/region: "ap-northeast-1"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  role: "KarpenterNodeRole-${CLUSTER_NAME}"

Implementation Architecture

1. Region Detection Logic

The Karpenter controller dynamically detects regions through the following steps:

  1. Monitor all EC2NodeClass resources
  2. Aggregate karpenter.sh/region tag values from subnetSelectorTerms
  3. Initialize EC2 API clients from the detected region list

2. Provisioning Flow

sequenceDiagram
    participant KC as "Karpenter Controller"
    participant NP as "NodePool"
    participant EC2NC as "EC2NodeClass"
    participant AWS as "AWS APIs (Multi-Region)"
    participant SQS as "SQS Queues (Multi-Region)"

    KC->>EC2NC: Dynamically detect region list
    EC2NC-->>KC: Aggregate from karpenter.sh/region tags
    KC->>KC: Initialize EC2 clients for each region
    KC->>SQS: Start monitoring SQS queues for each region
    KC->>NP: Check topology.kubernetes.io/region requirements
    KC->>AWS: Create instance in appropriate region
    AWS-->>KC: Return instance information
    SQS-->>KC: Receive Spot interruption/maintenance events

3. Integration with NodePool

Specifying multiple regions in NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: multi-region-nodepool
spec:
  template:
    metadata:
      labels:
        app: multi-region-app
    spec:
      requirements:
        - key: topology.kubernetes.io/region
          operator: In
          values: ["us-east-1", "ap-northeast-1", "eu-west-1"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: multi-region-nodeclass

SQS Interruption Queue Support

Multi-Region SQS Configuration

Current Karpenter processes interruption events with a single SQS queue. Multi-region support requires individual SQS queues in each region.

Environment Variable Configuration Format

INTERRUPTION_QUEUES=us-east-1:my-cluster,ap-northeast-1:my-cluster,eu-west-1:my-cluster

Advantages of this format:

  • Clear structure: Clear relationship between regions and queue names
  • Simplified parsing: Two-step processing with comma split → colon split
  • Backward compatibility: Traditional single queue processing when no colon is present

Interruption Controller Extension

Extend the current interruption controller to monitor SQS queues from multiple regions in parallel:

  • Manage SQS API clients per region
  • Process messages from each queue in parallel
  • Message routing including region information

CloudFormation Configuration Example

Queue configuration in each region:

# us-east-1
KarpenterInterruptionQueue-us-east-1:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: my-cluster
    MessageRetentionPeriod: 300
    SqsManagedSseEnabled: true

# ap-northeast-1  
KarpenterInterruptionQueue-ap-northeast-1:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: my-cluster
    MessageRetentionPeriod: 300
    SqsManagedSseEnabled: true

Technical Considerations

1. Network Configuration

  • Proper tagging of VPCs, subnets, and security groups in each region
  • Consider inter-region communication (if necessary)

2. IAM Permissions

  • EC2 operation permissions across multiple regions
  • Access permissions to SQS queues in each region
  • Region-specific resource access permissions

3. Pricing Information Integration

Integrate pricing information from multiple regions to achieve cost-optimized instance selection

4. Interruption Handling

  • EventBridge Rules configuration in each region (Spot Interruption, Maintenance Events)
  • Mapping between region-specific instance IDs and NodeClaims
  • Message processing concurrency and error handling

Benefits

  1. Simplified configuration: Declarative configuration only without environment variables (except SQS configuration)
  2. Dynamic scaling: Automatic region support expansion when adding new EC2NodeClass
  3. Reduced operational costs: Multiple region management with single Karpenter instance
  4. Improved availability: Automatic failover during regional failures
  5. Architectural consistency: Consistency with existing subnetSelectorTerms patterns
  6. Complete Spot support: Spot interruption event handling in each region

Constraints

  1. Network latency: Latency due to inter-region communication
  2. Configuration complexity: Managing configurations for multiple regions
  3. Debugging complexity: Difficulty in identifying root causes during failures
  4. SQS configuration complexity: Individual queue configuration required in each region

kahirokunn avatar May 27 '25 07:05 kahirokunn

Instances such as GPU or Spot servers may not be available in a single region. Even though EKS Auto Mode is supported, adding new EKS clusters is still not efficient in terms of management and cost. If we could use that feature, it would be a good method for us to use EKS clusters more efficiently.

jukops avatar Sep 03 '25 12:09 jukops