FXDesktopSearch
FXDesktopSearch copied to clipboard
Support for custom extensions
I would like to scan custom extensions as well. I work a lot with structured documents like .csv, .xml, .json etc. These could be scanned like normal text files.
Ah, a good requirement! Yet, what about document metadata? I don't thing authors can be extracted from the files, the only viable information would be the last modified date and the extracted content language. Maybe the new NLP features might find some named entities, but I don't think there are more options here. What do you think?
One can extend Tika to extract metadata if those xml, json, etc have a certain structure and contain necessary information. Since there is always going to be someone who says I miss extension X, I wonder if it would make sense to use patterns for things to scan somehow?