septum icon indicating copy to clipboard operation
septum copied to clipboard

Provide a better long-term solution for detection of text files.

Open pyjarrett opened this issue 3 years ago • 4 comments

Septum currently checks a very limited selection of extensions to determine if a file is text or not, in order to speed up loading of large source trees and minimize junk files loaded into memory to minimize its memory footprint.

pyjarrett avatar May 25 '21 03:05 pyjarrett

Now that septum supports configuration files, SP.Cache.Is_Text could accept a list of extensions from the Search, allowing this to be configurable on a per-user or a per-project basis.

pyjarrett avatar May 27 '21 02:05 pyjarrett

Have a look at this kalkin/file-expert . I wrote a programm for detecting the language type based on the data gathered by github/linguist. At some point in time I will refactor the code to provide C bindings for non Rust library users, if some one is interested. The other way is to reuse the data to rewrite file-expert as an Ada library. It should be pretty easy, a weekend or two project.

kalkin avatar Oct 05 '21 06:10 kalkin

@kalkin , you project looks exciting! I'm not sure if it helps solve this issue currently, since Septum's search is language agnostic and the goal is just to determine if a file is readable text, or binary data. At some future point, Septum might gain this need and then I'd reconsider.

pyjarrett avatar Oct 07 '21 02:10 pyjarrett

@pyjarrett Thanks! Seems like I misunderstood the workings of septum. I thought it does some basic language specific parsing.

If you ever want parse different languages I strongly suggest looking at tree-sitter if you do not know it yet it's a way to specify how to parse your library (in JS :() and then it generates you a library, which returns a universal ast, which contains all the line/character-range coordinates. Quiete a few popular programming languages have tree-sitter support already. https://github.com/tree-sitter/tree-sitter

kalkin avatar Oct 07 '21 14:10 kalkin