boxo icon indicating copy to clipboard operation
boxo copied to clipboard

gateway: ability to set response write timeout

Open lidel opened this issue 1 year ago • 1 comments

Problem

At Shipyard we've run some A/B tests at public gateway and lowering nginx timeout from 5m to 30s. This produced better UX and also raised the number of 200s while lowering 504s.

Right now, the boxo/gateway library does not have any timeout, aside from this failsafe 1h one, so we set timeout at .nginx is sitting in front of rainbow.

This is extra step that most of people running gateways does not do, thus wasting resources while looking for content that is not provided correctly to certain degree (nginx default timeout is 60s, while it could be lowered).

What is really unfortunate is that IPFS Desktop users hit gateway directly, and they never hit any timeout, unless it is their user agent (browser).

Proposed feature

We should introduce feature similar to nginx's proxy_read_timeout directly in the boxo/gateway library, make it configurable, but also set it to some implicit default (e.g. 30s).

It should not depend on any internal gateway logic, but solely count the time between two successful writes from server to the client.

This way everyone using boxo will save resources, and Desktop users will get meaningful error page sooner, and we will not regress.

Implementation ideas

Details tbd, but broad strokes idea for the boxo/gateway library will be to wrap existing handler in a generic response writer timeout handler:

func main() {
    gwHandler := // current boxo/gateway handler
    timeoutHandler := WithResponseWriteTimeout(gwHandler, 30*time.Second) // future handler will act like this
    http.ListenAndServe(":8080", timeoutHandler)
}
  • The WithResponseWriteTimeout middleware creates a timeoutResponseWriter and starts a timer.
    • "timeoutResponseWriter" wraps the original ResponseWriter and tracks the last successful write.
    • Every time data is written successfully, the timer is reset.
    • If no data is written for the specified duration, the timer expires, and a 504 Gateway Timeout status is sent to the client.

Configuration-wise, Config struct would get time.Duration field similar to block timeout in backend here, and NewHandler(config, backend) would set implicit default if not provided in config.

  • We should check context for list of things that node tried, and print useful error to user (i tried routing, found no peers, or found 4 pers, but all offline)

lidel avatar Oct 01 '24 13:10 lidel

Every time data is written successfully, the timer is reset.

I think that a single timeout should cover all writes in the response. Otherwise, a trickle of data, byte-by-byte, could take a very long time.

If no data is written for the specified duration, the timer expires, and a 504 Gateway Timeout status is sent to the client.

If this were a timeout waiting to read a response then returning http.StatusGatewayTimeout is appropriate. Since this is a write timeout, I think http.StatusServiceUnavailable is preferable. This is the behavior of http.TimeoutHandler

gammazero avatar Jan 28 '25 07:01 gammazero

I think a better name for what we need is "Retrieval Timeout" that covers both:

  • Time to first byte: If the gateway cannot start writing the response within this duration (e.g., stuck searching for providers), a 504 Gateway Timeout is returned.
  • Time between non-empty writes: After the first byte, the timeout resets each time new bytes are written to the client. If the gateway cannot write additional data within this duration after the last successful write, the response is terminated.

I took a stab at implementing/documenting this approach in https://github.com/ipfs/boxo/pull/994

lidel avatar Aug 11 '25 03:08 lidel