Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

Implement runtime filter for asset archiving

Open Copilot opened this issue 5 months ago • 0 comments

This PR implements runtime filters for asset archiving as requested, allowing users to limit asset archiving by number, file type, and time spent per page.

New Features

1. Maximum Assets Limit (--max-assets)

Limits the number of assets archived per page:

./Zeno get url https://example.com --max-assets 5

2. File Type Filtering

Control which asset types to archive:

# Only archive stylesheets and scripts
./Zeno get url https://example.com --assets-allowed-file-types css,js

# Exclude video files
./Zeno get url https://example.com --assets-disallowed-file-types mp4,avi,mov

3. Time-Based Filtering (--assets-archiving-timeout)

Stop archiving assets after a specified time per page:

./Zeno get url https://example.com --assets-archiving-timeout 30s

Implementation Details

  • Asset filtering is applied in the postprocessor during extraction using file extension matching
  • Timeout handling uses Go context cancellation in the archiver to cleanly stop asset archiving
  • Precedence rules: Allowed file types take precedence over disallowed types when both are specified
  • Default behavior preserved: When no filtering flags are specified, all assets are archived (existing behavior)

Combined Usage

All filters can be combined for fine-grained control:

./Zeno get url https://example.com \
  --max-assets 10 \
  --assets-allowed-file-types css,js \
  --assets-archiving-timeout 1m

Testing

The implementation includes comprehensive test coverage:

  • Unit tests for individual filter functionality
  • Integration tests for combined filtering scenarios
  • Edge case handling (nil assets, invalid URLs, configuration conflicts)
  • Validation that existing behavior is preserved

Fixes #388.

[!WARNING]

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: ./Zeno get url REDACTED --max-assets 2 --assets-allowed-file-types css,js --log-level debug --workers 1 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Sep 23 '25 08:09 Copilot