data-extraction topic

List data-extraction repositories

wiktionary-de-parser

24
Stars
8
Forks
Watchers

Extract data from German Wiktionary XML files.

hacker-news-digest

657
Stars
87
Forks
Watchers

:newspaper: Let ChatGPT Summarize Hacker News for You

data_extractor

27
Stars
4
Forks
Watchers

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

jsonpath

37
Stars
3
Forks
Watchers

A query expression for extracting data from JSON.

PDFLayoutTextStripper

1.5k
Stars
204
Forks
Watchers

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...

optimus

1.4k
Stars
234
Forks
Watchers

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

amazoncaptcha

419
Stars
74
Forks
Watchers

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

infoboxer

173
Stars
16
Forks
Watchers

Wikipedia information extraction library

npm-pdfreader

584
Stars
74
Forks
Watchers

🚜 Parse text and tables from PDF files.

flashtext

5.5k
Stars
592
Forks
Watchers

Extract Keywords from sentence or Replace keywords in sentences.