behemoth
behemoth copied to clipboard
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
It would be really helpful to use Apache Commons CLI for command line processing and then to try to standardize the names of input/output arguments, etc.
The UIMA and GATE annotation and type filters are configured using strings; by default if nothing is specified by the user no annotations are produced in the output. Instead it...
// Exception in thread "main" java.io.IOException: can't find class: com.digitalpebble.behemoth.tika.TextArrayWritable because com.digitalpebble.behemoth.tika.TextArrayWritable at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204)