Add FAQ documentation for architecture and design questions
Issue raised 5 questions about Arachnado's architecture: custom signals, WebSocket usage, Scrapy middleware compatibility, Splash support, and autologin/FormRequest functionality.
Changes
-
Created
docs/faq.rstwith detailed answers:-
Custom Signals: Documents 6 custom signals (
spider_closing,engine_paused,engine_resumed,engine_tick,downloader_enqueued,downloader_dequeued) required for real-time UI updates, pause/resume control, and multi-crawler signal aggregation - WebSocket vs HTTP API: Clarifies dual-API design - HTTP for crawler control (start/stop/pause/resume), WebSockets for real-time data streaming to avoid polling
- Middleware Compatibility: Confirms full Scrapy middleware support (cookies, robots.txt, redirects, etc.) and documents custom middleware
-
Splash Integration: Optional browser rendering support via
scrapy-splashwith configuration example -
Autologin/FormRequest: Built-in
autologin-middlewareintegration and standard Scrapy FormRequest support
-
Custom Signals: Documents 6 custom signals (
-
Updated
docs/index.rstto include FAQ in documentation index
Example Configuration
Starting a crawl with autologin enabled:
POST /crawler/start
{
"domain": "https://example.com/login",
"settings": {
"AUTOLOGIN_ENABLED": true,
"AUTOLOGIN_URL": "http://localhost:8089"
}
}
Documentation Preview
The FAQ provides technical depth with code examples and cross-references to existing documentation.
Original prompt
This section details on the original issue you should resolve
<issue_title>Signals</issue_title> <issue_description>I am looking at the code and see all the signals being re-mapped and hence Scrapy ExecutionEngine and Downloader being subclassed.
1- What is the need of these custom signals and re-implmenting signals? 2- There seems to be the use to web sockets for client communication, why isn't http api sufficient? 3- Does this thing obey regular scrapy middleware i.e. cookies, robots.txt etc middleware? 4- Does this uses splash as browser? 5- Are the auto login or FormRequest works with this?</issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes TeamHG-Memex/arachnado#34
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.