web-crawler topic

List web-crawler repositories

doc_crawler.py

20
Stars
7
Forks
Watchers

Explore a website recursively and download all the wanted documents (PDF, ODT…)

evine

174
Stars
32
Forks
Watchers

Interactive CLI Web Crawler

frequent

26
Stars
12
Forks
Watchers

A utility for crawling websites and building frequency lists of words

Strong-Web-Crawler

279
Stars
153
Forks
Watchers

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

awesome-web-scraper

241
Stars
46
Forks
Watchers

A collection of awesome web scaper, crawler.

dyer

133
Stars
14
Forks
Watchers

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

abot

2.2k
Stars
554
Forks
Watchers

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

crawlee

14.3k
Stars
597
Forks
99
Watchers

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

crawlab

10.9k
Stars
1.7k
Forks
Watchers

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

spider-flow

9.2k
Stars
1.8k
Forks
Watchers

新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。