content-extraction topic
List
content-extraction repositories
boilerpipe-ruby
40
Stars
5
Forks
Watchers
Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles
readability2
107
Stars
15
Forks
Watchers
Readability2 converts HTML to plain text.
extractnet
182
Stars
20
Forks
Watchers
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
learnhtml
32
Stars
9
Forks
Watchers
Web content extraction using machine learning
sumo
19
Stars
5
Forks
Watchers
Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
nextjs-pdf-parser
37
Stars
6
Forks
Watchers
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
pdfix_sdk_example_cpp
16
Stars
4
Forks
Watchers
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...