Chris Mattmann
Chris Mattmann
tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
MLwithTensorFlow2ed
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
imagecat
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extrac...
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
etllib
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
lucene-geo-gazetteer
Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.
nutch-python
Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit