arachnado icon indicating copy to clipboard operation
arachnado copied to clipboard

Add FAQ documentation for architecture and design questions

Open Copilot opened this issue 5 months ago • 0 comments

Issue raised 5 questions about Arachnado's architecture: custom signals, WebSocket usage, Scrapy middleware compatibility, Splash support, and autologin/FormRequest functionality.

Changes

  • Created docs/faq.rst with detailed answers:

    • Custom Signals: Documents 6 custom signals (spider_closing, engine_paused, engine_resumed, engine_tick, downloader_enqueued, downloader_dequeued) required for real-time UI updates, pause/resume control, and multi-crawler signal aggregation
    • WebSocket vs HTTP API: Clarifies dual-API design - HTTP for crawler control (start/stop/pause/resume), WebSockets for real-time data streaming to avoid polling
    • Middleware Compatibility: Confirms full Scrapy middleware support (cookies, robots.txt, redirects, etc.) and documents custom middleware
    • Splash Integration: Optional browser rendering support via scrapy-splash with configuration example
    • Autologin/FormRequest: Built-in autologin-middleware integration and standard Scrapy FormRequest support
  • Updated docs/index.rst to include FAQ in documentation index

Example Configuration

Starting a crawl with autologin enabled:

POST /crawler/start
{
    "domain": "https://example.com/login",
    "settings": {
        "AUTOLOGIN_ENABLED": true,
        "AUTOLOGIN_URL": "http://localhost:8089"
    }
}

Documentation Preview

FAQ Documentation

The FAQ provides technical depth with code examples and cross-references to existing documentation.

Original prompt

This section details on the original issue you should resolve

<issue_title>Signals</issue_title> <issue_description>I am looking at the code and see all the signals being re-mapped and hence Scrapy ExecutionEngine and Downloader being subclassed.

1- What is the need of these custom signals and re-implmenting signals? 2- There seems to be the use to web sockets for client communication, why isn't http api sufficient? 3- Does this thing obey regular scrapy middleware i.e. cookies, robots.txt etc middleware? 4- Does this uses splash as browser? 5- Are the auto login or FormRequest works with this?</issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes TeamHG-Memex/arachnado#34

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Nov 18 '25 01:11 Copilot