tika icon indicating copy to clipboard operation
tika copied to clipboard

Tika-2820: detection of Unix dump files (includes test files)

Open bitsgalore opened this issue 6 years ago • 0 comments

This adds support for detection of Unix dump files (magic adapted from file(1) magic by Christos Zoulas). I also hand-crafted some minimal test files (magic-only + zero-bytes) for all variations.

For this patch I created two new mimetypes x-tika-unix-dump-old and x-tika-unix-dump-new. Ideally I imagine it might be helpful if Tika would also report on the sub-variations within those types (big-endian, little-endian, ufs2 and 16-bit), but I'm not sure what's the best way to do this. (Maybe by adding a ;version=, but I don't think that's meant to be used like this).

If anyone has suggestions on a better way to do this I'd be happy to adapt the patch accordingly!

bitsgalore avatar Jan 25 '19 12:01 bitsgalore