Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

Implement dead hosts component to skip consistently failing hosts

Open Copilot opened this issue 5 months ago • 1 comments

This PR implements a dead hosts detection system that tracks hosts which consistently deny connections and automatically skips them to avoid wasted network timeouts during crawls.

Overview

The dead hosts component addresses a common crawling inefficiency where Zeno would repeatedly attempt to connect to hosts that are permanently unreachable (dead hosts). This results in unnecessary network timeouts and slower crawls. The implementation is inspired by similar functionality in warcprox and the WBM live web checker.

Key Features

  • Automatic Detection: Tracks network-level failures that indicate dead hosts:

    • Connection refused errors
    • DNS lookup failures (no such host, server misbehaving)
    • Network unreachable errors
    • Timeout errors
    • Host unreachable errors
  • Smart Skipping: Once a host is marked as dead (after configurable failures), all subsequent requests to that host are skipped immediately

  • Recovery Mechanism: Dead hosts cache is periodically cleaned up to allow hosts to recover

  • Disabled by Default: Following the suggestion in the issue comments, the feature is disabled by default to avoid potential side effects

  • Robust Error Detection: Uses proper Go error type checking with errors.As() for net.OpError, net.DNSError, and net.Error types, falling back to string matching only when necessary

  • Clean Architecture: Refactored function signatures using ArchiverDependencies struct to group related parameters and improve maintainability

Configuration Options

# Enable dead hosts detection
--dead-hosts-detection

# Number of connection failures before marking a host as dead (default: 15)
--dead-hosts-max-failures 10

# How often to clean up the cache (default: 30m)
--dead-hosts-refresh-period 1h

# How long to keep hosts marked as dead (default: 6h)
--dead-hosts-max-age 12h

Implementation Details

  • Architecture: New deadhosts package in internal/pkg/archiver/deadhosts/
  • Integration: Seamlessly integrated with existing archiver workflow using ArchiverDependencies struct
  • Thread-Safe: Uses sync.RWMutex for efficient concurrent access (chosen over sync.Map due to need for complex cleanup operations)
  • Resource Management: Proper cleanup and goroutine lifecycle management
  • Testing: Comprehensive unit tests and integration tests

Example Usage

# Basic crawl with dead hosts detection enabled
./Zeno get url https://example.com --dead-hosts-detection

# More aggressive settings for faster dead host detection
./Zeno get url https://example.com \
  --dead-hosts-detection \
  --dead-hosts-max-failures 5 \
  --dead-hosts-refresh-period 15m

The component integrates with Zeno's existing rate limiter and follows the same architectural patterns. When enabled, it will log when hosts are being skipped due to being marked as dead, helping operators understand crawl behavior.

Code Quality Improvements

Based on code review feedback, the implementation includes:

  • Improved Default Values: More conservative defaults (15 failures, 6h max-age) to balance efficiency with accuracy
  • Enhanced Error Detection: Type-safe error checking reduces false positives compared to string-only matching
  • Cleaner Function Signatures: Reduced parameter count and improved maintainability through dependency injection pattern
  • Better Documentation: Clear explanations of design choices and architectural decisions

Fixes #361.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Sep 23 '25 08:09 Copilot

Codecov Report

:x: Patch coverage is 94.87179% with 8 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 57.01%. Comparing base (1f0d58f) to head (56a127a).

Files with missing lines Patch % Lines
internal/pkg/archiver/general/archiver.go 61.53% 4 Missing and 1 partial :warning:
internal/pkg/archiver/worker.go 81.25% 2 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #499      +/-   ##
==========================================
+ Coverage   56.39%   57.01%   +0.61%     
==========================================
  Files         130      131       +1     
  Lines        8120     8272     +152     
==========================================
+ Hits         4579     4716     +137     
- Misses       3172     3184      +12     
- Partials      369      372       +3     
Flag Coverage Δ
e2etests 40.39% <30.76%> (-0.26%) :arrow_down:
unittests 30.20% <76.92%> (+0.87%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Sep 23 '25 08:09 codecov-commenter