extraction-engine topic

List extraction-engine repositories

mlscraper

1.2k
Stars
83
Forks
Watchers

🤖 Scrape data from HTML websites automatically by just providing examples

tabula-java

1.8k
Stars
409
Forks
Watchers

Extract tables from PDF files

tabula-sharp

142
Stars
22
Forks
Watchers

Extract tables from PDF files (port of tabula-java)

odinson

66
Stars
23
Forks
Watchers

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple represent...

camelot-sharp

31
Stars
5
Forks
Watchers

A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).