text-extraction topic
wikipedia_ner
:book: Labeled examples from wiki dumps in Python
any-text
Get text content from any file
BoilerPy3
Python port of Boilerpipe library
tokyo
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
textextractor2.0
:fire: This web app extracts text in an image.
pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
mobi
python based software to unpack kindlegen generated ebooks
mirusan
A PDF collection reader with built-in full-text search engine