feat(gateway): add configurable response write timeout
This commit introduces a configurable response write timeout for the IPFS gateway.
The timeout can be set via the ResponseWriteTimeout field in the gateway configuration.
If not set, a default timeout of 30 seconds is applied.
The implementation includes:
- A new
timeoutResponseWriterstruct that wraps the standardhttp.ResponseWriterand enforces the timeout. - A middleware function
WithResponseWriteTimeoutthat applies the timeout logic to the HTTP handler chain. - Comprehensive unit tests to verify the timeout behavior under various scenarios.
The timeout ensures that slow or unresponsive clients do not indefinitely hold server resources, improving the overall reliability and stability of the gateway.
This change also attempts to address the issue described in https://github.com/ipfs/boxo/issues/679 by providing a mechanism to handle slow or stuck HTTP responses more gracefully.
Thank you for submitting this PR! A maintainer will be here shortly to review it. We are super grateful, but we are also overloaded! Help us by making sure that:
-
The context for this PR is clear, with relevant discussion, decisions and stakeholders linked/mentioned.
-
Your contribution itself is clear (code comments, self-review for the rest) and in its best form. Follow the code contribution guidelines if they apply.
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment. Next steps:
-
A maintainer will triage and assign priority to this PR, commenting on any missing things and potentially assigning a reviewer for high priority items.
-
The PR gets reviews, discussed and approvals as needed.
-
The PR is merged by maintainers when it has been approved and comments addressed.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. We are very grateful for your contribution!
OK, I understand, it would be cleaner to set a uniform timeout.
This addresses what was asked for in #679 but I think that if can live without the ask, "Every time data is written successfully, the timer is reset" then things can be much simpler. For example: #818
Codecov Report
:x: Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 60.44%. Comparing base (9ea9632) to head (23dddd8).
:warning: Report is 306 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| gateway/handler.go | 93.75% | 2 Missing and 1 partial :warning: |
@@ Coverage Diff @@
## main #812 +/- ##
==========================================
- Coverage 60.48% 60.44% -0.04%
==========================================
Files 244 243 -1
Lines 31121 31147 +26
==========================================
+ Hits 18822 18827 +5
- Misses 10623 10639 +16
- Partials 1676 1681 +5
| Files with missing lines | Coverage Δ | |
|---|---|---|
| examples/gateway/common/handler.go | 95.50% <100.00%> (+0.10%) |
:arrow_up: |
| gateway/gateway.go | 83.54% <ø> (ø) |
|
| gateway/handler.go | 77.48% <93.75%> (+1.20%) |
:arrow_up: |
... and 11 files with indirect coverage changes
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Triage note:
- likely we want to focus on "timeout between byte flushes to client" rather than generic timeout (https://github.com/ipfs/boxo/pull/818)
- we will look into this before 0.34
Triage notes: moving to ~0.36~ 0.37
I will be looking at this as part of v0.37, likely cherry-pick useful commits from this PR and https://github.com/ipfs/boxo/pull/887 and introduce both settings in a single PR to avoid duplicated orchestration.
As for this PR, quick notes for self (things to check):
- Race condition: Timer reset in Write() method creates concurrency hazards?
- Double response risk: 504 status may conflict with handler's already-written response
- Goroutine leak: Missing(?) cleanup mechanism for timeout monitor goroutine
Let's go with https://github.com/ipfs/boxo/pull/994 which consolidates both types of limits + adds metrics and more tests.