Implement runtime filter for asset archiving
This PR implements runtime filters for asset archiving as requested, allowing users to limit asset archiving by number, file type, and time spent per page.
New Features
1. Maximum Assets Limit (--max-assets)
Limits the number of assets archived per page:
./Zeno get url https://example.com --max-assets 5
2. File Type Filtering
Control which asset types to archive:
# Only archive stylesheets and scripts
./Zeno get url https://example.com --assets-allowed-file-types css,js
# Exclude video files
./Zeno get url https://example.com --assets-disallowed-file-types mp4,avi,mov
3. Time-Based Filtering (--assets-archiving-timeout)
Stop archiving assets after a specified time per page:
./Zeno get url https://example.com --assets-archiving-timeout 30s
Implementation Details
- Asset filtering is applied in the postprocessor during extraction using file extension matching
- Timeout handling uses Go context cancellation in the archiver to cleanly stop asset archiving
- Precedence rules: Allowed file types take precedence over disallowed types when both are specified
- Default behavior preserved: When no filtering flags are specified, all assets are archived (existing behavior)
Combined Usage
All filters can be combined for fine-grained control:
./Zeno get url https://example.com \
--max-assets 10 \
--assets-allowed-file-types css,js \
--assets-archiving-timeout 1m
Testing
The implementation includes comprehensive test coverage:
- Unit tests for individual filter functionality
- Integration tests for combined filtering scenarios
- Edge case handling (nil assets, invalid URLs, configuration conflicts)
- Validation that existing behavior is preserved
Fixes #388.
[!WARNING]
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
example.com
- Triggering command:
./Zeno get url REDACTED --max-assets 2 --assets-allowed-file-types css,js --log-level debug --workers 1(dns block)If you need me to access, download, or install something from one of these locations, you can either:
- Configure Actions setup steps to set up my environment, which run before the firewall is enabled
- Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.