data-extraction topic

List data-extraction repositories

hred

64
Stars
1
Forks
Watchers

Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.

Scrapegraph-ai

21.9k
Stars
1.9k
Forks
21.9k
Watchers

Python scraper based on AI

firecrawl

68.4k
Stars
5.3k
Forks
68.4k
Watchers

🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

wildberries-parser-in-python

18
Stars
13
Forks
Watchers

WildBerries Parser is a Python script that extracts item information from Wildberries.ru and saves it in an Excel file. It supports parsing by directory or search keyword, collecting data like link, I...

youtube_data_engineering_project

15
Stars
3
Forks
Watchers

Data Engineering Project: Extracting music video metrics of Twice using YouTube API, AWS, and Tableau

Exif

15
Stars
6
Forks
Watchers

ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within...

scrappey-wrapper-python

22
Stars
0
Forks
22
Watchers

An API wrapper for Scrappey.com written in Python (cloudflare, datadome bypass & solver)

maxun

12.1k
Stars
932
Forks
Watchers

🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥

Scrapling

8.2k
Stars
466
Forks
8.2k
Watchers

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

parsera

1.2k
Stars
69
Forks
1.2k
Watchers

Lightweight library for scraping web-sites with LLMs