nginx-gateway-fabric NGINX Data Plane intermittently reports "no live upstreams" despite pods being healthy

NGINX Data Plane intermittently reports "no live upstreams" despite pods being healthy

Open NicolasPires777 opened this issue 4 weeks ago • 8 comments

trafficstars

Description: We are experiencing intermittent errors in NGINX Data Plane logs, reporting no live upstreams while connecting to upstream, even though all pods are healthy and running.

Details:

The error appears suddenly for all connections routed through NGINX.
Pods are not new; the most recent pod has been running for 12 hours.
CPU and memory usage of both the pods and NGINX are well below requests/limits.
Port-forwarding to both the service and individual pods works fine, indicating the pods are indeed reachable.
No recent deployments or configuration changes occurred.

The issue is transient but affects all traffic to the NGINX service when it occurs.

Example log (IP addresses removed for privacy):

[error] no live upstreams while connecting to upstream, client: <removed>, server: ~^, request: "GET /apis/primetime/api/v1/ads/adrequest/criteria/...", upstream: "http://prd-apps_primetime_4000/api/v1/ads/adrequest/criteria/..."

Observed behavior:

NGINX behaves as if no pods are available, even when they are healthy.

The problem is solved spontaneously or after a restart of NGINX Data Plane Deployment.

Expected behavior:

NGINX should consistently route requests to the available pods without reporting no live upstreams.

Questions / Investigation:

Could this be related to NGINX upstream health checks?

Could there be an ephemeral network or connection tracking issue?**

Oct 24 '25 22:10 NicolasPires777

nginx-gateway-fabric nginx-gateway-fabric copied to clipboard

NGINX Data Plane intermittently reports "no live upstreams" despite pods being healthy

nginx-gateway-fabric
nginx-gateway-fabric copied to clipboard