Implement dead hosts component to skip consistently failing hosts
This PR implements a dead hosts detection system that tracks hosts which consistently deny connections and automatically skips them to avoid wasted network timeouts during crawls.
Overview
The dead hosts component addresses a common crawling inefficiency where Zeno would repeatedly attempt to connect to hosts that are permanently unreachable (dead hosts). This results in unnecessary network timeouts and slower crawls. The implementation is inspired by similar functionality in warcprox and the WBM live web checker.
Key Features
-
Automatic Detection: Tracks network-level failures that indicate dead hosts:
- Connection refused errors
- DNS lookup failures (
no such host,server misbehaving) - Network unreachable errors
- Timeout errors
- Host unreachable errors
-
Smart Skipping: Once a host is marked as dead (after configurable failures), all subsequent requests to that host are skipped immediately
-
Recovery Mechanism: Dead hosts cache is periodically cleaned up to allow hosts to recover
-
Disabled by Default: Following the suggestion in the issue comments, the feature is disabled by default to avoid potential side effects
-
Robust Error Detection: Uses proper Go error type checking with
errors.As()fornet.OpError,net.DNSError, andnet.Errortypes, falling back to string matching only when necessary -
Clean Architecture: Refactored function signatures using
ArchiverDependenciesstruct to group related parameters and improve maintainability
Configuration Options
# Enable dead hosts detection
--dead-hosts-detection
# Number of connection failures before marking a host as dead (default: 15)
--dead-hosts-max-failures 10
# How often to clean up the cache (default: 30m)
--dead-hosts-refresh-period 1h
# How long to keep hosts marked as dead (default: 6h)
--dead-hosts-max-age 12h
Implementation Details
-
Architecture: New
deadhostspackage ininternal/pkg/archiver/deadhosts/ -
Integration: Seamlessly integrated with existing archiver workflow using
ArchiverDependenciesstruct -
Thread-Safe: Uses
sync.RWMutexfor efficient concurrent access (chosen oversync.Mapdue to need for complex cleanup operations) - Resource Management: Proper cleanup and goroutine lifecycle management
- Testing: Comprehensive unit tests and integration tests
Example Usage
# Basic crawl with dead hosts detection enabled
./Zeno get url https://example.com --dead-hosts-detection
# More aggressive settings for faster dead host detection
./Zeno get url https://example.com \
--dead-hosts-detection \
--dead-hosts-max-failures 5 \
--dead-hosts-refresh-period 15m
The component integrates with Zeno's existing rate limiter and follows the same architectural patterns. When enabled, it will log when hosts are being skipped due to being marked as dead, helping operators understand crawl behavior.
Code Quality Improvements
Based on code review feedback, the implementation includes:
- Improved Default Values: More conservative defaults (15 failures, 6h max-age) to balance efficiency with accuracy
- Enhanced Error Detection: Type-safe error checking reduces false positives compared to string-only matching
- Cleaner Function Signatures: Reduced parameter count and improved maintainability through dependency injection pattern
- Better Documentation: Clear explanations of design choices and architectural decisions
Fixes #361.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
Codecov Report
:x: Patch coverage is 94.87179% with 8 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 57.01%. Comparing base (1f0d58f) to head (56a127a).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| internal/pkg/archiver/general/archiver.go | 61.53% | 4 Missing and 1 partial :warning: |
| internal/pkg/archiver/worker.go | 81.25% | 2 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #499 +/- ##
==========================================
+ Coverage 56.39% 57.01% +0.61%
==========================================
Files 130 131 +1
Lines 8120 8272 +152
==========================================
+ Hits 4579 4716 +137
- Misses 3172 3184 +12
- Partials 369 372 +3
| Flag | Coverage Δ | |
|---|---|---|
| e2etests | 40.39% <30.76%> (-0.26%) |
:arrow_down: |
| unittests | 30.20% <76.92%> (+0.87%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.