Overdr0ne/sfs: Search File System for Emacs

#+TITLE: SFS (Search File System) #+OPTIONS: toc:2

(Just in case anyone is watching this for activity, I have paused direct development here to familiarize myself with guix/nix in an attempt to create a simpler, reproducible installation process for this and other packages with non-trivial external dependencies. However, do feel free to submit any issues you may be having; I continue to use sfs myself, and will try to support anyone wanting to try things out.)

[[#introduction][Introduction]]
[[#installation][Installation]]
[[#indexing][Indexing]]
[[#searching][Researching]]
[[#recollections][Recollections]]
[[#dired][Dired]]
[[#sfs-tags][sfs-tags]]

Introduction SFS is a collection of inter-operating emacs tools to make accessing and organizing your data faster and more ergonomic. It is largely powered by the Recoll file-indexing tool.
Installation Install the python dependencies: #+begin_src sudo pip3 install dbus-python PyGObject #+end_src You will also need to install Recoll, go [[https://www.lesbonscomptes.com/recoll/download.html][here]]. Then, I use straight+use-package: #+begin_src elisp (use-package sfs :straight (sfs :type git :host github :repo "Overdr0ne/sfs" :branch "master" :files ("sfs.el" "sfs-recoll.el" "sfs-tui.el" "sfs-tag.el" "sfs-reindex.el" "service.py" "evil-collection-sfs.el")) :config ;; This starts the server responsible for dispatching queries among other things (global-sfs-mode 1)) #+end_src
Indexing You will need to build a Recoll index first, if you’ve already done so, skip this step. Start by configuring what and how you want to index in your recoll.conf, or for convenience: #+begin_src sfs-reindex-config #+end_src Then, run the indexer, using the recollindex command, or within emacs: #+begin_src sfs-reindex-build #+end_src You may run that [[https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INDEXING.PERIODIC.html][periodically]], or have recoll index in [[https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INDEXING.MONITOR.html][real time ]]with the command recollindex -m. I just put that command in my .xinitrc, but you could also set it up as a systemd service or something. Before you can search, you must index your filesystem with the program recollindex. There are a number of options here, but basically you can do this [[https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INDEXING.PERIODIC.html][periodically]], or in [[https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INDEXING.MONITOR.html][real time]], updating the index every time a file is modified!
Researching Research queries are forwarded to the python search service that's started when the package loads, and the results are used to populate a dired buffer. Queries may be entered in the minibuffer with: #+begin_src sfs-recoll #+end_src Or, in the full query editor with: #+begin_src sfs-research #+end_src

[[./demos/sfs-res-demo.gif]]

The editor provides a number of keybindings to quickly insert any of Recolls metadata tags.

Recoll search queries are fully documented [[https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.SEARCH.LANG.html][here]]. The syntax is very similar to an internet search engine.

You can search for keywords individually: #+begin_src Query: keywords to look for in the index #+end_src Or clustered phrases: #+begin_src Query: "keywords to look" for in "the index" #+end_src Recoll also recognizes certain field names: #+begin_src Query: filename:test OR filename:bla ext:c dir:home -dir:bin size<10000 date:2001-03-01/2021-05-01 #+end_src So this gives me files with test or bla in the filename, .c files, home somewhere in the path, bin nowhere in the path, files less than 10000 bytes, and dated between 2001-03-01 and 2021-05-01.

The full list of field names for convenience:

title, subject or caption are synonyms which specify data to be searched for in the document title or subject.
author or from for searching the documents originators.
recipient or to for searching the documents recipients.
keyword for searching the document-specified keywords (few documents actually have any).
filename for the document's file name.
containerfilename. This is set for all documents, both top-level and contained sub-documents, and is always the name of the filesystem directory entry which contains the data.
ext specifies the file name extension (Ex: ext:html)
dir for filtering the results on file location
size for filtering the results on file size.
date for searching or filtering on dates.
mime or format for specifying the MIME type.
type or rclcat for specifying the category (as in text/media/presentation/etc.)

** Recording You can also record your queries. When you record a query, it can inserted into the sfs-recs hierarchy using '.' as a separator like you would a file.at.some.path.

[[./demos/sfs-res-record-demo.gif]]

Queries are saved using the customize interface. To make them persistent, you’ll need to call customize-save-customized to permanently add the query to sfs-recs. Personally, I’ved added customize-save-customized to the kill-emacs-hook: #+begin_src (add-hook ’kill-emacs-hook ’customize-save-customized) #+end_src This ensures sfs-recs are saved at the end of each emacs session.

An sfs-recs entry can be either a nested alist, or a function that generates such an alist! This is how I generate the recent.day, recent.week and recent.month entries. Those functions are evaluated when sfs-recollect is called.

Recollections Recollections are basically just saved queries. They can also be thought of as file directories in that they can composed into hierarchies, but unlike directories they are defined by the semantics of the query language. See what I mean with: #+begin_src sfs-recollect #+end_src

[[./demos/sfs-recollect-demo.gif]]

You can then execute the query at a point with , much like you would enter a directory in dired or something. So when I make a recollection called 'recent images', I can actually make sure it contains all and only image files that I saved in the past week or something, which is usually what people want when they define their file hierarchies. The hierarchy here is actually just an alternative representation of queries themselves, in other words, it is the AST representation of recoll search queries. So if you like, you can just write in a query in its hierarchical form and execute it, like so: #+begin_src * OR ** AND *** ext:c *** filename:test *** size:<1M ** AND *** ext:js *** filename:bla *** date:2019/ ** inline test query type:text #+end_src

[[./demos/sfs-rec-custom-demo.gif]]

I haven't quite covered the entirety of the query language here, specifically the ',' and '/' operators are not covered. Queries entered here are also not yet saved between sessions.

Redir Because Recoll stores so much metadata to make files searchable, we get these extra file 'properties' for free. If it is indexed, in dired you can access these properties for the file at point using #+begin_src sfs-represent #+end_src

[[./demos/sfs-redir-demo.gif]]

In my example, because I’ve indexed my entire filesystem, sfs-represent will actually work anywhere.

This command is bound to <C-return> in sfs-redir-mode, which is enabled automatically for sfs results. The command should work though in any dired buffer if the file is indexed. (It basically just computes the file md5 hash and uses that to look up the properties for that file, falling back to looking up based on filename in case it doesn't find it. And because recoll does not store the contiguous filename, this may find some other file with the same name but with two parent directories reversed, like /usr/bin/test and /bin/usr/test.)

sfs-tags unfinished sfs-tags is a set of utilities for tagging your data, primarily unix files using extended attributes. In a dired directory, tag the file at point with #+begin_src sfs-tag-set #+end_src and dump the tag info with #+begin_src sfs-tag-get #+end_src
TODO [0/7]

[-] Add a fancy start page with lots of suggested tag buttons, search history, help commands, etc. that can be conveniently added and removed and composed, sorta like a root directory, model it a bit like magit with really short keybindings.
- [-] the researcher still needs search history, suggestions and tags.
- [-] the recollector works like a root directory
[ ] parse infixed queries from the recollector into their AST representations such that they can be inserted into the recollections.
[-] Iterative filtering of results so it feels more like you are navigating your file-system, but non-hierarchically!
- [-] I think the query system is fast enough that simply not deleting the query after search is good enough, though it would be nice to eventually make the updating dynamic...
[ ] completion integration for each tag
[ ] maybe store queries in their AST form as files such that they can themselves use a derivative of the dired interface. This also provides a flexible, clear solution to persistence, and creates a single source of truth for building multiple perspectives on queries, inside or outside of emacs. I also like how this links query components together as independent nodes, rather than one monolithic structure, so you more conveniently manipulate subtrees without needing the whole tree.
[ ] Improve help to make the query language more obvious and intuitive
[-] Cross-platform...
- [-] Linux
- [ ] Mac
- [ ] Windows
Bugs...
- From Recoll: "mime, rclcat, size and date criteria always affect the whole query (they are applied as a final filter), even if set with other terms inside a parenthese. mime (or the equivalent rclcat) is the only field with an OR default. You do need to use OR with ext terms for example." One consequence is you can't OR dates :(
- results are not chunked, and dired has a hard time managing massive results, so things slow down pretty massively if the number of results is large.
- The query editor is basically a major mode inside another major mode. Emacs does not natively support multiple major modes in a single buffer, so what I have is a big, ugly hack. You can break out of editibility by backspacing the first character, then you're stuck in read-only. ugh...
- Syntax highlighting is very basic, not context aware, and can conflate a date '/' with the or operator

sfs
sfs copied to clipboard

Metadata

← Metadata

Owner

Metadata

sfs sfs copied to clipboard

Metadata

← Metadata

Owner

Metadata

sfs
sfs copied to clipboard