tika topic

List tika repositories

tika

2.2k
Stars
745
Forks
Watchers

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

memex-explorer

121
Stars
69
Forks
Watchers

Viewers for statistics and dashboarding of Domain Search Engine data

sparkler

410
Stars
142
Forks
Watchers

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

extract

235
Stars
30
Forks
Watchers

A cross-platform command line tool for parallelised content extraction and analysis.

tikaondotnet

193
Stars
74
Forks
Watchers

Use the Java Tika text extraction library on the .NET platform

fscrawler

1.3k
Stars
293
Forks
Watchers

Elasticsearch File System Crawler (FS Crawler)

MLwithTensorFlow2ed

135
Stars
68
Forks
Watchers

Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications

rtika

54
Stars
8
Forks
Watchers

R Interface to Apache Tika

imagecat

94
Stars
40
Forks
Watchers

ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extrac...

php-apache-tika

111
Stars
22
Forks
Watchers

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats