web-crawler topic

List web-crawler repositories

doc_crawler.py

20

Stars

7

Forks

Watchers

Explore a website recursively and download all the wanted documents (PDF, ODT…)

evine

174

Stars

32

Forks

Watchers

Interactive CLI Web Crawler

frequent

26

Stars

12

Forks

Watchers

A utility for crawling websites and building frequency lists of words

frequency-lists

web-crawler-python

Strong-Web-Crawler

279

Stars

153

Forks

Watchers

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

awesome-web-scraper

241

Stars

46

Forks

Watchers

A collection of awesome web scaper, crawler.

dyer

133

Stars

14

Forks

Watchers

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

rust-programming-language

abot

2.2k

Stars

554

Forks

Watchers

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

crawlee

14.3k

Stars

597

Forks

99

Watchers

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

crawlab

10.9k

Stars

1.7k

Forks

Watchers

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

spider-flow

9.2k

Stars

1.8k

Forks

Watchers

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。