webcrawling topic

List webcrawling repositories

scrapyrt

817
Stars
160
Forks
Watchers

HTTP API for Scrapy spiders

opensearchserver

499
Stars
190
Forks
Watchers

Open-source Enterprise Grade Search Engine Software

seleniumcrawler

127
Stars
46
Forks
Watchers

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

FinnewsHunter

1.3k
Stars
304
Forks
1.3k
Watchers

FinnewsHunter: 基于 AgenticX 的多智能体金融情报系统。实时监控全网财经资讯,并进行深度解读与情感分析,挖掘投资阿尔法信号

DotnetCrawler

170
Stars
60
Forks
Watchers

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like Web...

heritrix3

2.7k
Stars
755
Forks
Watchers

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Raspagem-de-dados-para-iniciantes

131
Stars
20
Forks
Watchers

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

ralger

152
Stars
14
Forks
Watchers

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

gotor

154
Stars
43
Forks
Watchers

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

newspaperjs

71
Stars
19
Forks
Watchers

News extraction and scraping. Article Parsing