Copilot
Copilot
Arachnado currently crawls entire domains regardless of the path in the start URL. Providing `example.com/docs/api` crawls all of `example.com` instead of just the `/docs/api` subtree. ## Changes - **Extract and...
The issue "How to run custom spider?" lacked documentation. While Arachnado supports custom Scrapy spiders via `spider_packages` config and `spider://` URL format, this was undocumented. ## Changes ### Documentation (`docs/custom-spiders.rst`)...
Arachnado crashes on Windows with `AttributeError: 'Process' object has no attribute 'num_fds'`. The psutil library provides `num_fds()` only on Unix/Linux/macOS, while Windows uses `num_handles()`. ## Changes - **arachnado/process_stats.py**: Replace direct...
UI becomes unresponsive when many transfers are active due to excessive WebSocket traffic and React re-renders. ## Backend: Throttle jobs:state updates (monitor.py) - Limit WebSocket updates to 2/sec (500ms throttle)...
Arachnado currently auto-resumes all jobs with status "shutdown" or "running" on server restart. This adds a configuration option to control that behavior. ## Changes - **Config option**: Added `resume_on_start` (default:...
Issue raised 5 questions about Arachnado's architecture: custom signals, WebSocket usage, Scrapy middleware compatibility, Splash support, and autologin/FormRequest functionality. ## Changes - **Created `docs/faq.rst`** with detailed answers: - **Custom Signals**:...
`site_checker.py` imports `bot_detector.detector.Detector` but the dependency is not documented in `requirements.txt` or `setup.py`. The package is not available on PyPI and the code already handles its absence gracefully via try-except....
The `scrapy.xlib` compatibility module was removed in Scrapy 2.0, causing `ModuleNotFoundError` on modern Scrapy versions. ### Changes - Import `ResponseFailed` directly from `twisted.web.client` instead of deprecated `scrapy.xlib.tx` ```python # Before...
`MongoExportPipeline` fails silently when `MONGO_EXPORT_ITEMS_URI` and `MONGO_EXPORT_JOBS_URI` are undefined, causing scraped items to not be stored. These settings were only set by `__main__.py` at server startup, but had no defaults...
`SafeConfigParser` was removed in Python 3.12, causing `ImportError` when importing `arachnado.config`. ## Changes - **arachnado/config.py**: Replace `SafeConfigParser` with `ConfigParser` - In Python 3.2+, `ConfigParser` is safe by default (equivalent to...