RegexType reads a line for every byte in mutableOffset.offset
I wanted to use my local machine’s /usr/share/file/magic/kml to detect KML and KMZ files, with this code:
ContentInfoUtil matcher =
new ContentInfoUtil(new File("/usr/share/file/magic/kml"));
ContentInfo info = matcher.findMatch(new File(kmlFile));
But it always fails, because this line in the magic file never matches:
>>&0 regex ['"]http://earth.google.com/kml Google KML document
It appears this is because RegexType is reading an entire line for every byte in the mutableOffset, causing the matching content to be skipped entirely. In other words, if mutableOffset.offset is ten, the code reads ten lines, rather than limiting its scope to ten bytes.
I found I was able to get KML files to be correctly detected by changing these lines in RegexType.java from this:
if (i < mutableOffset.offset) {
bytesOffset += line.length() + 1;
}
to this:
if (i < mutableOffset.offset) {
bytesOffset += line.length() + 1;
i += line.length();
}
According to my reading of the docs, the regex matching type is supposed to match on lines and not bytes. Maybe the pattern is wrong?