webcrawling topic

List webcrawling repositories

scrapyrt

817
Stars
160
Forks
Watchers

HTTP API for Scrapy spiders

opensearchserver

499
Stars
190
Forks
Watchers

Open-source Enterprise Grade Search Engine Software

seleniumcrawler

127
Stars
46
Forks
Watchers

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测

DotnetCrawler

170
Stars
60
Forks
Watchers

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like Web...

heritrix3

2.7k
Stars
755
Forks
Watchers

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Raspagem-de-dados-para-iniciantes

131
Stars
20
Forks
Watchers

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

ralger

152
Stars
14
Forks
Watchers

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

gotor

154
Stars
43
Forks
Watchers

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

newspaperjs

71
Stars
19
Forks
Watchers

News extraction and scraping. Article Parsing