web-crawler topic
doc_crawler.py
Explore a website recursively and download all the wanted documents (PDF, ODT…)
frequent
A utility for crawling websites and building frequency lists of words
Strong-Web-Crawler
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。
awesome-web-scraper
A collection of awesome web scaper, crawler.
dyer
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
spider-flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。