simplemagic icon indicating copy to clipboard operation
simplemagic copied to clipboard

RegexType reads a line for every byte in mutableOffset.offset

Open craigpell opened this issue 7 years ago • 1 comments

I wanted to use my local machine’s /usr/share/file/magic/kml to detect KML and KMZ files, with this code:

ContentInfoUtil matcher =
    new ContentInfoUtil(new File("/usr/share/file/magic/kml"));

ContentInfo info = matcher.findMatch(new File(kmlFile));

But it always fails, because this line in the magic file never matches:

>>&0 regex ['"]http://earth.google.com/kml Google KML document

It appears this is because RegexType is reading an entire line for every byte in the mutableOffset, causing the matching content to be skipped entirely. In other words, if mutableOffset.offset is ten, the code reads ten lines, rather than limiting its scope to ten bytes.

I found I was able to get KML files to be correctly detected by changing these lines in RegexType.java from this:

if (i < mutableOffset.offset) {
    bytesOffset += line.length() + 1;
}

to this:

if (i < mutableOffset.offset) {
    bytesOffset += line.length() + 1;
    i += line.length();
}

craigpell avatar May 09 '18 15:05 craigpell

According to my reading of the docs, the regex matching type is supposed to match on lines and not bytes. Maybe the pattern is wrong?

j256 avatar Jul 11 '18 16:07 j256