nginx-gateway-fabric icon indicating copy to clipboard operation
nginx-gateway-fabric copied to clipboard

NGINX Data Plane intermittently reports "no live upstreams" despite pods being healthy

Open NicolasPires777 opened this issue 4 weeks ago • 8 comments
trafficstars

Description: We are experiencing intermittent errors in NGINX Data Plane logs, reporting no live upstreams while connecting to upstream, even though all pods are healthy and running.

Details:

  • The error appears suddenly for all connections routed through NGINX.
  • Pods are not new; the most recent pod has been running for 12 hours.
  • CPU and memory usage of both the pods and NGINX are well below requests/limits.
  • Port-forwarding to both the service and individual pods works fine, indicating the pods are indeed reachable.
  • No recent deployments or configuration changes occurred.

The issue is transient but affects all traffic to the NGINX service when it occurs.

Example log (IP addresses removed for privacy):

[error] no live upstreams while connecting to upstream, client: <removed>, server: ~^, request: "GET /apis/primetime/api/v1/ads/adrequest/criteria/...", upstream: "http://prd-apps_primetime_4000/api/v1/ads/adrequest/criteria/..."

Observed behavior:

NGINX behaves as if no pods are available, even when they are healthy.

The problem is solved spontaneously or after a restart of NGINX Data Plane Deployment.

Expected behavior:

NGINX should consistently route requests to the available pods without reporting no live upstreams.

Questions / Investigation:

Could this be related to NGINX upstream health checks?

Could there be an ephemeral network or connection tracking issue?**

NicolasPires777 avatar Oct 24 '25 22:10 NicolasPires777