fuzz-aldrin-plus
fuzz-aldrin-plus copied to clipboard
Document API / Usage
My apologies if I missed it while scanning over your README, but I didn't notice any sections that document usage of this library or its API. I'm aware the API is similar to https://github.com/atom/fuzzaldrin, but it would be great to have it listed here as well.
Thank you for asking this question, indeed this is long needed. I'll discuss a bit here, then when it's clear i'll post to readme.
Basic usage
fz = require("fuzzaldrin-plus")
Filtering
Filtering is the process if finding valid entries among a list of candidate and sorting them by score, given a query.
- A candidate is valid if query is a subsequence of it.
- that is, every character of query is present in the candidate in proper oder. (alternatively it's possible to produce query by only deleting characters from candidate)
- The score aproximate a meaningfullness of the subsequence.
- Does it happens together or scathered ?
- Does it happens at interesting places ? (eg acronym position)
Filtering Array of strings
- Input: array of string,
- Ouput: sorted & filtered array of string
fz.filter(candidates, query)
Example:
candidates = [
'Find And Replace: Select All',
'Settings View: Uninstall Packages',
'Settings View: View Installed Themes',
'Application: Install Update',
'Install'
]
results = fz.filter(candidates, 'install')
Filtering Array of objects
- Input: array of objects,
- Ouput: sorted & filtered array of object, score is computed comparing specified key to query
fz.filter(candidates, query, {key:"mykey"})
//filter & sort list of objects by obj.mykey
Scoring
Filtering is provided to provide some out-of-box usefulness, but most of this library is about finding the proper score between a candidate and a query. (Score of 0 meaning entry should be filtered out)
Outside of debugging, generating a score is mostly useful to generate your own filtering algorithm. For example
- control iteration on a special data structure
- control ectraction / computation of the candidate string from the candidate object
- modify score based on external information (boost to recent files, boost to autocomplete entry near the insertion point)
If you have such a need you can use scoring with the folowing guideline:
- Prepare the query
- Iterate on each elements
- Compute canditate string from object
- Compute the match score
- Adjust score with external information as needed
- If score indicate a match include <candidate, score> on an intermediate list
- Sort intermediate list by score
- Build ouput list from intermediate list
- keep best items
- extract candidate from <candidate, score>
Basic scoring
- Input: string to be scored, query
- Ouput: Score(double), 0 if non match, positive otherwise.
score = fz.filter(candidate_string, query)
There's no variant that take an object and ask wich key because at this point you can probably do it better.
It is not recommanded to display result of scoring to user. Even if the ordering try to be intuitive, the score by itself is very hard to interpret since it mix together a lot of quality signals, is non linear and sometime jumpy.
Loop scoring (prepQuery)
The basic idea is to precompute some quantity upfront about the query so we do less work on a candidate by candidate basis.
prepared = fz.prepQuery(query)
for(...){
score = fz.filter(candidate_string, query, prepared)
}
Note: there's the recent addition of a cache on the fz object that store last query and coresponding prepared query. So, in simple for loop with constant query, this should not be needed anymore.
Matching
To communicate why the algorithm think a result is good or bad, it's often good to highligth matched characters. The function match return an array of position where candidate_string match query. (If multiple are possible it return one of the position set that produce the best score)
fz.match(candidate_string, query)
Note fz.match(candidate_string, query, prepQuery)
is also available.
See also the demo on how to wrap that ouput with html tag https://github.com/jeancroy/fuzzaldrin-plus/blob/master/demo/demo.html#L85-L137
Advanced
All of the method (filter, score, match, prepQuery) take an option hash.
Some of those setting are common to all (for example tweak on how to score).
Some setting are specific example key
in filter.
I may return to document those when I have more time, but most user don't need them.
Thanks! This was very helpful. I'm working on a set of TypeScript typings for your library, does the following look correct to you?
// Type definitions for fuzzaldrin-plus
// Project: https://github.com/jeancroy/fuzzaldrin-plus/
// Definitions by: Jason Killian <https://github.com/jkillian>
// Definitions: https://github.com/DefinitelyTyped/DefinitelyTyped
export as namespace fuzzaldrin;
export interface IQueryOptions {
pathSeparator?: string;
optCharRegEx?: RegExp;
}
export interface IScoringOptions extends IQueryOptions {
allowErrors?: boolean;
isPath?: boolean;
useExtensionBonus?: boolean;
}
export interface IFilterOptions extends IScoringOptions {
key?: string;
maxResults?: number;
}
export type PreparedQuery = { __internalAPIBrand: string; };
export function filter<T>(data: T[], query: string, options?: IFilterOptions): T[];
export function score(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number;
export function match(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number[];
export function prepQuery(query: string, options?: IQueryOptions): PreparedQuery;
Note that the export as namespace fuzzaldrin;
line denotes that the library is published in a UMD format.
Curently prepQuery looks more like this https://github.com/jeancroy/fuzzaldrin-plus/blob/master/src/scorer.coffee#L66-L73
Others signature seems OK.
My thinking was that those are private fields that are only meant for use by your library and not by an external user. The way I wrote things above basically only lets users pass a PreparedQuery
to score
and match
but not access its internal data.
Does that seem like the right decision?
Yes, thank you that look good. I have some plan to add some options I guess when that settle out we'll see how to extend the option hash definition.
Great! See PR here if you're interested
@JKillian, @jeancroy I have updated the TS typing for latest changes. check PR 11865. @jeancroy, Can we also update npm package with a new released version?
Thanks for that, I'll try to keep more stability in the interface for the future.
The reason I've demoted prepared query from it's own argument is that the internal cache was giving just as good performance than explicitly setting a prepared query. So no caring about prepared query allow simpler usage.
Yes, that is one thing that should be taken care with every new changes. I was thinking to get a Nuget package published for this lib to make it available for .net projects or other projects that don't depend on Node.js. Currently we are using a copy of this lib (converted to TS). With Typings and a Nuget package, we can take a package dependency instead of a converted source. @jeancroy, thoughts?
I'm open to maintaining a nugget package. And/or outputting typescript as a distribution format on each build. ( I may actually be due to cut a real release soon )
I'm also not that invested in the current coffescript form. The package was written for Atom text editor and cofeescript was what they used. But now they are moving to es6 and have babel in their tool chain I believe, so there might be a natural compromise in between es6 and typescript that is closer to actual usages.
Awesome, moving to ES6 will definitely bring more cohesion with other projects as most of them are evolving in that direction to get more out of box functionality and performance. So, here is how I think we can maintain a NuGet release:
Maintain a separate release
branch:
- Appveyor config to automate package publishing for releases.
- Travis CI to ensure the latest release is compatible with DefinitelyTyped typings.
master
branch works as dev branch for regular improvements/updates and is merged to release
.
What you think? I can contribute in that direction as I get time.
Hi @jeancroy, are you ok with the approach? I already have some work in my local repository for this.
Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.
Hi Jean,
May I know your Email ID? I will add you as owner for nuget package.
Thanks
On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy < [email protected]> wrote:
Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeancroy/fuzzaldrin-plus/issues/24#issuecomment-255108146, or mute the thread https://github.com/notifications/unsubscribe-auth/ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V .
hi I'm registered with nugget as jeancroy, with email [email protected]
Jean Christophe Roy
On Tue, Oct 25, 2016 at 2:44 PM, Manish Dahamiwal [email protected] wrote:
Hi Jean,
May I know your Email ID? I will add you as owner for nuget package.
Thanks
On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy < [email protected]> wrote:
Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/jeancroy/fuzzaldrin-plus/issues/24# issuecomment-255108146>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V> .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jeancroy/fuzzaldrin-plus/issues/24#issuecomment-256134736, or mute the thread https://github.com/notifications/unsubscribe-auth/AMLCEopRajjavpgKzi0i3_6_UDqFoB0Nks5q3k3xgaJpZM4JsR-V .