tika
tika copied to clipboard
Tika-2820: detection of Unix dump files (includes test files)
This adds support for detection of Unix dump files (magic adapted from file(1) magic by Christos Zoulas). I also hand-crafted some minimal test files (magic-only + zero-bytes) for all variations.
For this patch I created two new mimetypes x-tika-unix-dump-old and x-tika-unix-dump-new. Ideally I imagine it might be helpful if Tika would also report on the sub-variations within those types (big-endian, little-endian, ufs2 and 16-bit), but I'm not sure what's the best way to do this. (Maybe by adding a ;version=, but I don't think that's meant to be used like this).
If anyone has suggestions on a better way to do this I'd be happy to adapt the patch accordingly!