Scrapinghub

Results 29 repositories owned by Scrapinghub

flatson

32
Stars
7
Forks
Watchers

Tool to flatten stream of JSON-like objects, configured via schema

frontera

1.3k
Stars
217
Forks
Watchers

A scalable frontier for web crawlers

kafka-scanner

19
Stars
5
Forks
Watchers

High Level Kafka Scanner

mdr

110
Stars
30
Forks
Watchers

A python library detect and extract listing data from HTML page.

number-parser

103
Stars
21
Forks
Watchers

Parse numbers written in natural language

page_clustering

35
Stars
8
Forks
Watchers

A simple algorithm for clustering web pages, suitable for crawlers

page_finder

30
Stars
10
Forks
Watchers

Find which links on a web page are pagination links

andi

17
Stars
5
Forks
Watchers

Library for annotation-based dependency injection

autopager

15
Stars
4
Forks
Watchers

Detect and classify pagination links