fzf-for-js
fzf-for-js copied to clipboard
[Self] Ideas
Hey, this issue is to collate all the ideas that I have in mind. If you have an idea other than the ones mentioned, please create a separate issue for it. If you have suggestions on improving any of the following ideas, drop down a comment.
Verify outputted types
More (and clearer) examples
This library can make a search on 22000 English words without breaking a sweat. However the demo that shows it is not evident enough. Furthermore, there are many improvements in examples done under experiments/*
branches on which we haven't finalized yet. There is also an example of a basic selector but not a more involved one like:
(v) => `${v.firstName} ${v.lastName}`
// ...
// map result items like as selector and highlight matched chars
Add TypeScript based examples too.
Improve perf
See:
- https://x.com/warianoguerra/status/1579174724730425344
- https://github.com/farzher/fuzzysort
- https://github.com/bevacqua/fuzzysearch
Clear code separation and contributing guidelines
- [x] Currently this project is structured like generic webapp rather than a library. This is intentional, but might create confusion for some. We can split docs and lib under
src/
to make it much clearer. - [ ]
npx patch-package
is required to run for tests to work correctly. However this isn't documented yet. A contributing doc needs to be created. - [ ] Adding an issue template would be useful too.
Multiple Selectors and Key Weights
This is something junegunn/fzf doesn't has and this library might too favor covering features other than this.
An idea is to use an array of slightly modified version of existing Fzf
class which instead of returning and sorting data appends each item's result to already existing array, something like
list[i].matches.push(_match);
list[i].overallScore += _match.result.score;
and these modified Fzf
instances share query's runes. The reason of opting for a list is to have each instance having a different selector.
So the proposed API would look like:
// lib/main.ts
interface Options<U> {
// ...
weight: number // defaults to 1, 0 < weight <= 1
}
// app.ts
import { FzfGroup } from 'fzf'
const fzfGroup = new FzfGroup(list, [/* options for each FZF instance */])
const entries = fzfGroup.find("stuff")
The weight will be multiplied to the score.
Command palette
The motivation to port FZF to JS was to create a command palette, which I still didn't had a chance to work on.
- https://github.com/timc1/kbar
- https://twitter.com/devongovett/status/1437558553502367751
- https://cmdk.paco.me/
Offloading to web worker/using WASM for performance
There is a WASM based finder and few others which can use web workers like this and this one.
I tried using FZF for JS on web workers (see this branch) and found degradation in performance (perceived performance, not measured performance). I suspect that it could be due to web workers being mainly designed to offload tasks from main thread and these workers might not get CPU and memory resources to a level that a main thread can. I need to investigate how the web worker based libs I mentioned in previous paragraph is able to achieve better perf.
Note to self: Merely converting to code to WASM might even degrade performance. See https://surma.dev/things/js-to-asc/index.html. Threading on WASM is certainly needed.
Relevant names and links:
- AssemblyScript
- https://github.com/thi-ng/tinyalloc
- https://dzone.com/articles/webassembly-threads-in-firefox
- https://github.com/WebAssembly/threads
- https://v8.dev/features/simd
I'm also preferring to create a version based on (async) generators over promises as an alternative to the current sync version.
Once web worker is done, maybe we could try WebGPU (or WebGL) https://www.youtube.com/watch?v=K2JzIUIHIhc?
See also: Intent to Ship WebGPU in Chromium, WebGPU Explainer
Create and expose reindex
or keep FZF for JS work on static list?
Efficiently doing re-indexing could be incredibly difficult to do and there is no easy API to expose this to the user. Few libraries like the full text searching library Minisearch allows something along these lines. junegunn/fzf, due to its nature of usage, too doesn't work on dynamic list.
Introduce caching for query patterns and for results
(Related: defer positions
creation and create it after limit
is processed on the list)
Description to be filled.
Typo tolerance
This is something that was declined by original author (see https://github.com/junegunn/fzf/issues/2272#issuecomment-736175232).
My initial thought is while it could be implemented, it will be restricted to fuzzy-only, non-extended options. That means:
{
extended: false,
algo: 'v1' | 'v2'
}
I'll prefer Sublime-like, only 1 character typo tolerance. Here are few resources:
- Fuzzysort - does everything but this
- A blog post and a reply by jskinner (author of Sublime Text)
There can be multiple typo tolerance mechanisms — allow mistyping same letter two times (caat instead of cat), neighbor letters being transposed (cta instead of cat), a letter being mistyped (cet instead of cat), etc. Even one if like, can impose many rules around it. Though the Sublime one is much simpler:
- Only neighboring transpose is allowed (abuot can match with about, submduleo won't match submodule)
- First char transpose in query is not allowed (baout won't match about, obut won't match about)
- Not more than one transpose is allowed (sbumoduel won't match submodule)
More resources:
- https://stackoverflow.com/questions/487003/how-to-detect-a-typo-in-a-product-search-and-suggest-possible-corrections
- https://www.reddit.com/r/ReverseEngineering/comments/5mn2g9/reverse_engineering_sublime_text/
- https://elixirforum.com/t/seqfuzz-sublime-text-like-fuzzy-search-my-first-hex-package/32348
- https://www.objc.io/blog/2020/08/18/fuzzy-search/
- https://github.com/android-password-store/sublime-fuzzy
- Google search - dice coefficient vs levenshtein - dice (npm): string-similarity - levenshtein (npm): fastest-levenshtein, leven
Stemming and stop words
These terms are taken from https://github.com/bvaughn/js-search
- Stop words: User can use a selector to sanitize the items, i.e. remove words that aren't semantically meaningful.
- Stemming: FzfGroup API discussed under Multiple Selectors and Key Weights can help here.
However both of these features have same priority as Multiple Selectors and Key Weights.
Old fuse.js interactive docs
People liked interactive doc on its homepage. Maybe add a Try link in docs providing access to a CodeSandbox with options bounded to React state for interactive experience?
Resources:
- https://github.com/codesandbox/sandpack
- https://github.com/FormidableLabs/use-editable
- https://github.com/satya164/react-simple-code-editor
Other
- junegunn/fzf CLI
--help
This lists the items that are done ✅:
Right indexing in the presence of diacritics
Will appear in a release done after v0.5.1.
While we normalize[1] query and strings in the list internally, we return indices assuming the user has already done the normalization before they highlight the matched characters.
We need to mention this in the docs. Whether the list items have diacritics or not, I think it is a good idea to add .normalize()
anyway.
1: Normalization here does not mean removing diacritics (accents) from the string, it means to merge latin character with a diacritic
Normalization
normalize
is now on by default. (https://github.com/ajitid/fzf-for-js/pull/8)
Normalizing characters by default makes more sense. If the decision weighs in towards keeping normalize disabled we can thinking of bundling without normalize map and rather importing it instead when needed like day.js or date-fns do for locale. On importing, will patch and fill the empty normalize map and thus reducing bundle size in many cases.
Partially match words in any order
Covered as part of extended search. (https://github.com/ajitid/fzf-for-js/pull/15)
As of version 0.3.1, if you have a list containing:-
- krsty silver
- krsty-silver
- krsty/silver
and you query for "kr si", you get all items back. However if you search for "si kr" you get none. This behavior does not matches with the one we see in junegunn/fzf (Golang) which returns all items back. This issue is already addressed in feat/reverse-order-words-match branch. However the implementation isn't cross-verified with junegunn/fzf. We need to check:-
- If there is an even more performant way to do it
- Whether the score still matches with its Golang counterpart or not
nvim-telescope/telescope-fzf-native.nvim at first glance seems to show correct behaviour too which is unexpected for me as they only port main algorithm. Check if their code can be referred as well.
Update: This feature is actually a part of Extended Search which is not implemented yet.
Extended Search
It now can be used by passing extended: true
in options. It is intentionally made opt-in, as for people who are unfamiliar with junegunn/fzf patterns for this search might initially find the results confusing. (https://github.com/ajitid/fzf-for-js/pull/15)
fuse.js, telescope-fzf-native and junegunn/fzf have support for pattern matching. FZF for JS lacks this currently.
Tiebreaker
It is now available as an option (https://github.com/ajitid/fzf-for-js/pull/17). To respect order of the items that are given in a list by the user, we don't apply any tiebreaker by default.
Currently this library expects the user to send a list in an order that is already sorted by their own descending scoring model, meaning if they want to match 'aa'
with ['aa', 'aab']
and prefer to have aab
appear first, they should send ['aab', 'aa']
.
Here is a snippet from junegunn/fzf CLI's fzf --help
:
--tiebreak=CRI[,..] Comma-separated list of sort criteria to apply
when the scores are tied [length|begin|end|index]
(default: length)
Naming convention
We've decided to keep both terms as is.
Other name for selector
Candidates: accessor, access, getBy, getFn, getStr, use, on, queryOn, ~~pick~~
Reason for rejection:
- pick - can be misinterpreted as user is limited to choosing only one value from item object, whereas selector can return a combination of strings
v => v.firstName + " " + v.lastName
Other name for entries
Candidates: matches, ~~results~~
Reason for rejection:
- results -
result
is already part of returned object from FZF algos
Partially match words in any order (again)
We've decided to not to include as it will complicate APIs. (https://github.com/ajitid/fzf-for-js/pull/54)
While extended search is useful, there seems to be a good case of one not needing all the special patterns but only wanting the ability to fuzzy match by typing query words in any order. (Personally, this is what I would prefer over a basic match.)
This is useful, for example, when the user is searching by people name and types last name before first name. This is also useful if the names have apostrophe in them.
We would also like to put this information in the doc of when the user of the library should not opt for extended search and for disabling diacritics.
Sorter
Sorting now can be turned off (https://github.com/ajitid/fzf-for-js/pull/31). Allowing a custom sorter will complicate its relationship with tiebreaker, thus this part of feature is rejected.
Either allow user to pass a custom sorter or disable the sorter entirely (using null
). ~~This is somewhat related to Tiebreaker.~~
Update: Contrary to what I thought, Tiebreakers aren't plain sorter implementations in junegunn/fzf.
Requirement for various transformations on the results will always be there (sort, threshold, etc.). To solve this, instead of adding options to Fzf
itself, a callbag like interface will be given (see example) or common utility functions will simply be exposed so that users can create a wrapper around find
. Lunr too uses concept of a pipeline.
Option to disable forward
Added (https://github.com/ajitid/fzf-for-js/pull/29)
When searching in file paths or URIs, users will usually prefer end of the path to match first.
Add tests for API
Both statement and branch coverage are above 90%.
The algorithm has tests carried over from junegunn/fzf but the exposed API doesn't.
Regarding your web worker notes above... one thing to consider is that offloading the search to a worker can free up the main thread for the user. I actually do that when using fusejs because in some cases the user is searching across multiple properties on a large array of items. While the overall search performance might be slower, the UI freezing up on the user is more perceptible than search being slower.
@caschbre to get a perceptible lag-free UI, an async finder would be introduced in the next version of FZF.