Tim Allison
Tim Allison
In long running parser processes, the reliance on deleteOnExit in MediaDataBox can overwhelm the tmp directory. See: https://issues.apache.org/jira/browse/TIKA-3203
I regret that I forked this project so that I could make some quick improvements for robustness for the next release of Apache Tika. Fuzzing found some infinite loops and...
We added -spawnChild mode to tika-server to defend against catastrophic failures -- oom, infinite loops etc. We should make this the default in tika-python. We'll need to make the python...
I recently ran exiftool on a bunch of tiffs that we have in our regression corpus on Apache Tika. I was interested to see that there can be text (OCR'd...
(Please include as much information as possible, and attach a sample image if possible.) I'm attaching two files, one with XMP and one with EXIF. They were both donated to...
Over on Apache Tika (https://issues.apache.org/jira/browse/TIKA-3412), we'd like to migrate our mp4 parsing to metadata-extractor. With the no longer apparently supported sannies parser (https://github.com/sannies/mp4parser), we're able to extract useful data from...
This is not the best format for this info. This isn't a bug report. I wanted to share the normalized stderrs and their counts from running pdfcpu's validate on 8...
Tika's legacy behavior was to concatenate the content of embedded documents into one handler and ignore metadata from embedded documents. This was probably driven by the desire to allow Tika...
Dependency upgrades with required updates for checkstyle and maven dependency plugin.
On Tika we've gathered two quines with their creators' permissions. One is a zip file that when unzipped is exactly the same file; the other is a gz file with...