multiplier icon indicating copy to clipboard operation
multiplier copied to clipboard

Provide fast fuzzy search

Open frabert opened this issue 2 years ago • 1 comments

Either use ripgrep, or:

  • generate an index of trigrams of each symbol at index time
  • compute trigrams of the query
  • use a MinHash scheme to find best candidates

this has the potential to be faster than ripgrep, but is limited to symbol names

frabert avatar Dec 19 '22 15:12 frabert

I think one low/medium effort way to achieve this could be the following:

  • [ ] Extract out the data for files from inside the rpc::File, and put it into a column of the file table.
  • [ ] Add a fts5 virtual table that uses the file table as an external content table, where the rowid maps to file.file_id.
  • [ ] Use snippet and this as an example of how to correlate matches with specific file offsets.
  • [ ] Provide an api in the entity provider, implemented in the SQLiteEntityProvider, to get the file ids and byte ranges matching the search.
  • [ ] Work it into an actual API, that yields out matching tokens.

pgoodman avatar Jan 04 '23 23:01 pgoodman