crawling topic
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
rod
A Devtools driver for web automation and scraping
isp-data-pollution
ISP Data Pollution to Protect Private Browsing History with Obfuscation
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
scrapyrt
HTTP API for Scrapy spiders
scrapy-selenium
Scrapy middleware to handle javascript pages using selenium
second-order
Second-order subdomain takeover scanner
easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.