dotcom-rendering icon indicating copy to clipboard operation
dotcom-rendering copied to clipboard

Review router nginx retry behaviour

Open arelra opened this issue 6 months ago • 0 comments

We have observed spikes in requests to the Frontend stack without seeing corresponding increased requests to router.

This leads us to believe there is retry behaviour in router (an nginx server).

This ticket is to investigate this behaviour and determine a root cause.

If we are able to propose a fix to the router nginx configuration that can be done as a follow up.

For example on 29-07-2024 @ 16.36 there was a spike in requests to facia-frontend but no spike in router.

facia-frontend:

Image

router:

Image

@jorgeazevedo has also been investigating this behaviour with more examples here.

Notes from @jorgeazevedo

Timeout configuration for router https://github.com/guardian/platform/blob/main/router/files/router.conf#L145-L158

Default behaviour on timeout and 5XX https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream "Specifies in which cases a request should be passed to the next server:" "default: [...] error timeout;"

Behaviour when a dns resolves to multiple IP addresses https://nginx.org/en/docs/http/ngx_http_upstream_module.html#server "A domain name that resolves to several IP addresses defines multiple servers at once."

Definition of availability zones (subnets) for upstream Load Balancers https://github.com/guardian/platform/blob/81c89a366d04b5c7d3e9b3447dd7cf5713c48969/provisioning/cloudformation/frontend-service.yaml#L319

References from the internet https://news.ycombinator.com/item?id=11217477 https://serverfault.com/questions/528653/how-can-i-stop-nginx-from-retrying-put-or-post-requests-on-upstream-server-timeo

arelra avatar Jul 30 '24 12:07 arelra