fselect icon indicating copy to clipboard operation
fselect copied to clipboard

Persistent Index?

Open dsully opened this issue 3 years ago • 3 comments

Any plans to allow for the creation + update of a persistent index (perhaps SQLite backed)?

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Thanks

dsully avatar Apr 10 '21 15:04 dsully

To be able to use indexes, RDBMS have to have control over modification of indexed data, so changes to it could be reflected on indexes as well.

Since this application doesn't control who changes files on your disk and how, i'm not sure how it would work.

Perhaps adding FS watcher (like with inotify) that runs fselect on changed files and updates some CSV file could be possible, but it could be done externally.

Or populating CSV file with fselect and querying it with some other tool? Importing that CSV into RDBMS as a table is a solution too.

pavlus avatar Jul 09 '21 22:07 pavlus

Maybe using third-party indices is a more viable solution for fselect. I definitely plan to support Everything on Windows some day.

jhspetersson avatar Jul 10 '21 06:07 jhspetersson

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Sounds like you're looking more for a cache? (I suppose this is one of those things where SQL & Filesystem terms kinda clash... because <Index> != <Cache> in the SQL world.)

It would seem ideal and fitting to store cached results in some kind of database... (Perhaps SQLite or otherwise?)

Maybe even two modes:

  1. Cache results the first time they're searched (no pre-calculation)
  2. Pre-Calculate and cache all results in a batch job

I think it would be important to actually benchmark how long it takes to scan videos, etc. Using the sha256 (or ) as the primary key would allow the file path to change to anywhere and still have metadata on the media file... But whether or not this is actually faster would be down to [Time to Scan Video File Metadata] vs. [Time to Hash File]. On systems with SHA processor extensions, it'd probably be faster to hash the file, but that's just a guess.

(Storing the filepath alongside in the DB would have the advantage that you could easily add verification of files at zero performance penalty - if you're already hashing the file, that is.)

On the file name front, maybe it's possible to use an existing mlocate/plocate database to speed up name searches? (Pretty sure those two don't store file metadata though.)

danieldjewell avatar Sep 15 '21 18:09 danieldjewell