Add project upload endpoint for running custom Scrapy spiders
Enables users to upload Scrapy project archives and run custom spiders through the WebUI, eliminating the need for manual spider configuration.
Changes
Backend:
-
ProjectManagerclass handles project extraction (zip/tar.gz), spider package discovery via scrapy.cfg parsing, and lifecycle management -
/project/uploadendpoint validates project names (alphanumeric +_-), extracts archives, and registers spider packages -
/project/listendpoint returns uploaded projects -
DomainCrawlersmodified to dynamically load spider packages from uploaded projects on startup and on-demand - Configuration adds
uploaded_projects_diroption (opt-in)
Frontend:
-
ProjectUploadReact component provides collapsible upload form with validation and status feedback - Integrated into main page between crawl form and jobs panel
Documentation:
- API endpoints documented in
docs/http-api.rst - User guide added to
README.rstand help page
Usage
Upload via WebUI or API:
curl -X POST \
-F "project_name=myproject" \
-F "[email protected]" \
http://localhost:8888/project/upload
Run uploaded spiders using spider://spider_name in crawl form. Projects persist across restarts.
Tests
6 unit tests cover extraction, discovery, listing, and error handling. CodeQL analysis shows no vulnerabilities.
Original prompt
This section details on the original issue you should resolve
<issue_title>Need to add an option to upload the scrapy project and run the spider by arachnado </issue_title> <issue_description>Hi all: It may be difficult for user to config the spider written by themselves ,i think if we create an option then just need to upload the spider they have written , and run by the WebUI ,it will become a very useful tool . </issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes TeamHG-Memex/arachnado#13
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.