data-extraction topic
wiktionary-de-parser
Extract data from German Wiktionary XML files.
hacker-news-digest
:newspaper: Let ChatGPT Summarize Hacker News for You
data_extractor
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
jsonpath
A query expression for extracting data from JSON.
PDFLayoutTextStripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
amazoncaptcha
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
infoboxer
Wikipedia information extraction library
npm-pdfreader
🚜 Parse text and tables from PDF files.
flashtext
Extract Keywords from sentence or Replace keywords in sentences.