spring-cloud-kubernetes icon indicating copy to clipboard operation
spring-cloud-kubernetes copied to clipboard

Zone-aware load balancing doesn't work with Spring Cloud Kubernetes Discovery

Open deathcoder opened this issue 2 months ago • 3 comments

Zone-aware load balancing doesn't work with Spring Cloud Kubernetes Discovery

Environment

  • Spring Boot: 3.2.0
  • Spring Cloud: 2023.0.0
  • Spring Cloud Kubernetes: 3.1.0
  • Kubernetes: v1.27+ (tested with Kind)
  • Load Balancer Mode: POD

Summary

The built-in ZonePreferenceServiceInstanceListSupplier does not work with Spring Cloud Kubernetes Discovery because zone information stored in DefaultKubernetesServiceInstance.podMetadata() is not accessible through the ServiceInstance.getMetadata() interface that the zone preference logic uses.

Expected Behavior

When using Spring Cloud Kubernetes Discovery with zone-aware load balancing configuration:

spring:
  cloud:
    kubernetes:
      loadbalancer:
        enabled: true
        mode: POD
        zone-preference-enabled: true
    loadbalancer:
      zone: ${ZONE}
      configurations: zone-preference

And building a ServiceInstanceListSupplier with:

ServiceInstanceListSupplier.builder()
    .withDiscoveryClient()
    .withZonePreference()
    .build(context);

Expected: Requests should be routed only to service instances in the same availability zone as the client.

Actual Behavior

Requests are distributed randomly across all zones (approximately 50/50 split between zones), indicating that zone filtering is not working.

Reproduction

A complete reproduction repository is available with:

  • Working Kind cluster setup
  • Sample services with zone labels
  • Three client implementations showing the problem and workarounds
  • Test scripts demonstrating the issue
  • link: https://github.com/deathcoder/kubernetes-client-loadbalancer-example

More details about this are available in the project mds, and in the Details section

Questions

  1. Is there a configuration option we're missing that would expose pod labels in getMetadata() for use with the built-in ZonePreferenceServiceInstanceListSupplier?

  2. Is this an architectural limitation where ServiceInstance.getMetadata() is intentionally separate from pod-specific metadata?

  3. Should Spring Cloud Kubernetes automatically populate zone information from pod labels into getMetadata() to work with the standard Spring Cloud LoadBalancer zone preference?

  4. Is this a bug that should be fixed, or is the recommended approach to use custom suppliers when zone-aware routing is needed with Kubernetes?

Thank you for your time! Any guidance on the recommended approach for zone-aware load balancing with Spring Cloud Kubernetes would be greatly appreciated.

Test Results

{
  "clientZone": "zone-a",
  "totalCalls": 20,
  "sameZoneCalls": 10,
  "crossZoneCalls": 10,
  "sameZonePercentage": "50.0%"
}

Root Cause Analysis

After extensive investigation, we found that:

  1. Zone information IS available in Spring Cloud Kubernetes Discovery - it's stored in pod labels
  2. But it's in the wrong place for the built-in zone preference logic to find it

Where zone information exists:

DefaultKubernetesServiceInstance instance = ...; // from discovery

// ✅ Zone IS available here:
Map<String, Map<String, String>> podMetadata = instance.podMetadata();
String zone = podMetadata.get("labels").get("topology.kubernetes.io/zone");
// Returns: "zone-a"

// ❌ But NOT available here (where ZonePreferenceServiceInstanceListSupplier looks):
Map<String, String> metadata = instance.getMetadata();
String zone = metadata.get("zone"); // Returns: null
String zone = metadata.get("topology.kubernetes.io/zone"); // Returns: null

Investigation details:

Available in getMetadata():

[app, port.http, k8s_namespace, type, kubectl.kubernetes.io/last-applied-configuration]

Available in podMetadata():

{
  "labels": {
    "app": "sample-service",
    "pod-template-hash": "6f74896b6d",
    "topology.kubernetes.io/zone": "zone-a",
    "zone": "zone-a"
  },
  "annotations": {
    "kubectl.kubernetes.io/restartedAt": "2025-10-14T14:53:02+02:00"
  }
}

Why ZonePreferenceServiceInstanceListSupplier doesn't work:

Looking at the Spring Cloud LoadBalancer source, ZonePreferenceServiceInstanceListSupplier uses:

private String getZone(ServiceInstance serviceInstance) {
    Map<String, String> metadata = serviceInstance.getMetadata();
    if (metadata != null) {
        return metadata.get(ZONE); // Looks for "zone" key
    }
    return null;
}

This method only checks getMetadata(), not podMetadata(), so it never finds the zone information.

Workarounds

We've identified three working approaches:

Workaround 1: Custom supplier accessing podMetadata()

public class PodMetadataZoneServiceInstanceListSupplier implements ServiceInstanceListSupplier {
    
    private final ServiceInstanceListSupplier delegate;
    private final String clientZone;

    @Override
    public Flux<List<ServiceInstance>> get() {
        return delegate.get().map(instances -> {
            if (clientZone == null || "unknown".equalsIgnoreCase(clientZone)) {
                return instances;
            }
            
            return instances.stream()
                .filter(instance -> clientZone.equalsIgnoreCase(getZoneFromPodMetadata(instance)))
                .collect(Collectors.toList());
        });
    }
    
    private String getZoneFromPodMetadata(ServiceInstance instance) {
        if (!(instance instanceof DefaultKubernetesServiceInstance)) {
            return null;
        }
        
        DefaultKubernetesServiceInstance k8sInstance = (DefaultKubernetesServiceInstance) instance;
        Map<String, Map<String, String>> podMetadata = k8sInstance.podMetadata();
        
        if (podMetadata != null && podMetadata.containsKey("labels")) {
            Map<String, String> labels = podMetadata.get("labels");
            String zone = labels.get("topology.kubernetes.io/zone");
            if (zone == null) {
                zone = labels.get("zone");
            }
            return zone;
        }
        
        return null;
    }
}

Workaround 2: Using Kubernetes EndpointSlices API

// Query EndpointSlices which have native zone support via endpoint.getZone()
EndpointSliceList slices = kubernetesClient.discovery().v1()
    .endpointSlices()
    .inNamespace(namespace)
    .withLabel("kubernetes.io/service-name", serviceId)
    .list();

// Build IP to zone mapping
for (EndpointSlice slice : slices.getItems()) {
    for (Endpoint endpoint : slice.getEndpoints()) {
        String zone = endpoint.getZone(); // Native zone support!
        for (String ip : endpoint.getAddresses()) {
            ipToZoneCache.put(ip, zone);
        }
    }
}

Workaround 3: Direct Kubernetes API queries for pod labels

// Query pods by IP to get their labels
List<Pod> pods = kubernetesClient.pods()
    .inNamespace(namespace)
    .list()
    .getItems();

Pod matchingPod = pods.stream()
    .filter(pod -> podIp.equals(pod.getStatus().getPodIP()))
    .findFirst()
    .orElse(null);

if (matchingPod != null) {
    String zone = matchingPod.getMetadata().getLabels()
        .get("topology.kubernetes.io/zone");
}

Reproduction

A complete reproduction repository is available with:

  • Working Kind cluster setup
  • Sample services with zone labels
  • Three client implementations showing the problem and workarounds
  • Test scripts demonstrating the issue

Configuration used:

Pod Labels:

labels:
  app: sample-service
  topology.kubernetes.io/zone: zone-a  # Standard Kubernetes zone label
  zone: zone-a                          # Alternative zone label

Discovery Configuration:

spring:
  cloud:
    kubernetes:
      discovery:
        enabled: true
        metadata:
          add-pod-labels: true
          add-pod-annotations: true
          labels-prefix: ""
          annotations-prefix: ""
      loadbalancer:
        enabled: true
        mode: POD
        zone-preference-enabled: true

LoadBalancer Configuration:

spring:
  cloud:
    loadbalancer:
      zone: ${ZONE}
      configurations: zone-preference

Additional Context

This issue is critical for production deployments where:

  • Services are deployed across multiple availability zones
  • Cross-zone traffic incurs additional latency and costs
  • Zone affinity is required for performance and resilience

The workarounds are functional but require custom code that should ideally be handled by the framework. Understanding whether this is expected behavior or a gap in the integration between Spring Cloud LoadBalancer and Spring Cloud Kubernetes would help the community implement zone-aware routing correctly.

Related Documentation

Versions

<properties>
    <spring-boot.version>3.2.0</spring-boot.version>
    <spring-cloud.version>2023.0.0</spring-cloud.version>
    <spring-cloud-kubernetes.version>3.1.0</spring-cloud-kubernetes.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-kubernetes-client</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-kubernetes-client-loadbalancer</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-loadbalancer</artifactId>
    </dependency>
</dependencies>

deathcoder avatar Oct 14 '25 16:10 deathcoder

This looks interesting to look at. I'll see if I find some time this week to look at your sample. Thank you for the report

wind57 avatar Oct 14 '25 20:10 wind57

i know you probably didnt have the time to look into it yet, just an heads up that i have updated the linked project with out latest iteration of the custom solution, you'll find it in the wrapped-client-service folder, in this solution instead of applying our custom logic with a custom ListSupplier, we are providing our custom wrapping of the discovery client to ensure that the metadata needed for ZonePreferenceServiceInstanceListSupplier to work, is actually there... so i guess this looks cleaner although it still feels like the whole thing was designed to support this use case and so it still feels like a bug that its not working as intended (or maybe we are just using the lib wrong)

deathcoder avatar Oct 23 '25 10:10 deathcoder

this is on TODO list of mine, I will get to it once I free from some other priorities.

wind57 avatar Oct 23 '25 13:10 wind57