Add FAQ documentation for architecture and design questions

Open Copilot opened this issue 5 months ago • 0 comments

Issue raised 5 questions about Arachnado's architecture: custom signals, WebSocket usage, Scrapy middleware compatibility, Splash support, and autologin/FormRequest functionality.

Changes

Created docs/faq.rst with detailed answers:
- Custom Signals: Documents 6 custom signals (spider_closing, engine_paused, engine_resumed, engine_tick, downloader_enqueued, downloader_dequeued) required for real-time UI updates, pause/resume control, and multi-crawler signal aggregation
- WebSocket vs HTTP API: Clarifies dual-API design - HTTP for crawler control (start/stop/pause/resume), WebSockets for real-time data streaming to avoid polling
- Middleware Compatibility: Confirms full Scrapy middleware support (cookies, robots.txt, redirects, etc.) and documents custom middleware
- Splash Integration: Optional browser rendering support via scrapy-splash with configuration example
- Autologin/FormRequest: Built-in autologin-middleware integration and standard Scrapy FormRequest support
Updated docs/index.rst to include FAQ in documentation index

Example Configuration

Starting a crawl with autologin enabled:

POST /crawler/start
{
    "domain": "https://example.com/login",
    "settings": {
        "AUTOLOGIN_ENABLED": true,
        "AUTOLOGIN_URL": "http://localhost:8089"
    }
}

Documentation Preview

FAQ Documentation

The FAQ provides technical depth with code examples and cross-references to existing documentation.

Original prompt

This section details on the original issue you should resolve

<issue_title>Signals</issue_title> <issue_description>I am looking at the code and see all the signals being re-mapped and hence Scrapy ExecutionEngine and Downloader being subclassed.

1- What is the need of these custom signals and re-implmenting signals? 2- There seems to be the use to web sockets for client communication, why isn't http api sufficient? 3- Does this thing obey regular scrapy middleware i.e. cookies, robots.txt etc middleware? 4- Does this uses splash as browser? 5- Are the auto login or FormRequest works with this?</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes TeamHG-Memex/arachnado#34

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Nov 18 '25 01:11 Copilot