FXDesktopSearch icon indicating copy to clipboard operation
FXDesktopSearch copied to clipboard

Support for custom extensions

Open skin27 opened this issue 8 years ago • 2 comments

I would like to scan custom extensions as well. I work a lot with structured documents like .csv, .xml, .json etc. These could be scanned like normal text files.

skin27 avatar Mar 07 '17 15:03 skin27

Ah, a good requirement! Yet, what about document metadata? I don't thing authors can be extracted from the files, the only viable information would be the last modified date and the extracted content language. Maybe the new NLP features might find some named entities, but I don't think there are more options here. What do you think?

mirkosertic avatar May 11 '19 19:05 mirkosertic

One can extend Tika to extract metadata if those xml, json, etc have a certain structure and contain necessary information. Since there is always going to be someone who says I miss extension X, I wonder if it would make sense to use patterns for things to scan somehow?

mlt avatar May 29 '19 23:05 mlt