boxo icon indicating copy to clipboard operation
boxo copied to clipboard

feat(gateway): add configurable response write timeout

Open gitsrc opened this issue 11 months ago • 4 comments

This commit introduces a configurable response write timeout for the IPFS gateway. The timeout can be set via the ResponseWriteTimeout field in the gateway configuration. If not set, a default timeout of 30 seconds is applied.

The implementation includes:

  • A new timeoutResponseWriter struct that wraps the standard http.ResponseWriter and enforces the timeout.
  • A middleware function WithResponseWriteTimeout that applies the timeout logic to the HTTP handler chain.
  • Comprehensive unit tests to verify the timeout behavior under various scenarios.

The timeout ensures that slow or unresponsive clients do not indefinitely hold server resources, improving the overall reliability and stability of the gateway.

This change also attempts to address the issue described in https://github.com/ipfs/boxo/issues/679 by providing a mechanism to handle slow or stuck HTTP responses more gracefully.

gitsrc avatar Jan 23 '25 14:01 gitsrc

Thank you for submitting this PR! A maintainer will be here shortly to review it. We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the rest) and in its best form. Follow the code contribution guidelines if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment. Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on any missing things and potentially assigning a reviewer for high priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. We are very grateful for your contribution!

welcome[bot] avatar Jan 23 '25 14:01 welcome[bot]

OK, I understand, it would be cleaner to set a uniform timeout.

This addresses what was asked for in #679 but I think that if can live without the ask, "Every time data is written successfully, the timer is reset" then things can be much simpler. For example: #818

gitsrc avatar Feb 01 '25 08:02 gitsrc

Codecov Report

:x: Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 60.44%. Comparing base (9ea9632) to head (23dddd8). :warning: Report is 306 commits behind head on main.

Files with missing lines Patch % Lines
gateway/handler.go 93.75% 2 Missing and 1 partial :warning:

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #812      +/-   ##
==========================================
- Coverage   60.48%   60.44%   -0.04%     
==========================================
  Files         244      243       -1     
  Lines       31121    31147      +26     
==========================================
+ Hits        18822    18827       +5     
- Misses      10623    10639      +16     
- Partials     1676     1681       +5     
Files with missing lines Coverage Δ
examples/gateway/common/handler.go 95.50% <100.00%> (+0.10%) :arrow_up:
gateway/gateway.go 83.54% <ø> (ø)
gateway/handler.go 77.48% <93.75%> (+1.20%) :arrow_up:

... and 11 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Feb 05 '25 16:02 codecov[bot]

Triage note:

  • likely we want to focus on "timeout between byte flushes to client" rather than generic timeout (https://github.com/ipfs/boxo/pull/818)
  • we will look into this before 0.34

lidel avatar Feb 11 '25 15:02 lidel

Triage notes: moving to ~0.36~ 0.37

lidel avatar May 20 '25 14:05 lidel

I will be looking at this as part of v0.37, likely cherry-pick useful commits from this PR and https://github.com/ipfs/boxo/pull/887 and introduce both settings in a single PR to avoid duplicated orchestration.

As for this PR, quick notes for self (things to check):

  • Race condition: Timer reset in Write() method creates concurrency hazards?
  • Double response risk: 504 status may conflict with handler's already-written response
  • Goroutine leak: Missing(?) cleanup mechanism for timeout monitor goroutine

lidel avatar Aug 08 '25 02:08 lidel

Let's go with https://github.com/ipfs/boxo/pull/994 which consolidates both types of limits + adds metrics and more tests.

lidel avatar Aug 11 '25 03:08 lidel