fuzz-aldrin-plus Document API / Usage

My apologies if I missed it while scanning over your README, but I didn't notice any sections that document usage of this library or its API. I'm aware the API is similar to https://github.com/atom/fuzzaldrin, but it would be great to have it listed here as well.

Aug 24 '16 17:08 jkillian

Thank you for asking this question, indeed this is long needed. I'll discuss a bit here, then when it's clear i'll post to readme.

Basic usage

fz = require("fuzzaldrin-plus")

Filtering

Filtering is the process if finding valid entries among a list of candidate and sorting them by score, given a query.

A candidate is valid if query is a subsequence of it.
- that is, every character of query is present in the candidate in proper oder. (alternatively it's possible to produce query by only deleting characters from candidate)
The score aproximate a meaningfullness of the subsequence.
- Does it happens together or scathered ?
- Does it happens at interesting places ? (eg acronym position)

Filtering Array of strings

Input: array of string,
Ouput: sorted & filtered array of string

fz.filter(candidates, query)

Example:


candidates = [
        'Find And Replace: Select All',
        'Settings View: Uninstall Packages',
        'Settings View: View Installed Themes',
        'Application: Install Update',
        'Install'
      ]

results = fz.filter(candidates, 'install')

Filtering Array of objects

Input: array of objects,
Ouput: sorted & filtered array of object, score is computed comparing specified key to query

fz.filter(candidates, query, {key:"mykey"})
//filter & sort list of objects by obj.mykey

Scoring

Filtering is provided to provide some out-of-box usefulness, but most of this library is about finding the proper score between a candidate and a query. (Score of 0 meaning entry should be filtered out)

Outside of debugging, generating a score is mostly useful to generate your own filtering algorithm. For example

control iteration on a special data structure
control ectraction / computation of the candidate string from the candidate object
modify score based on external information (boost to recent files, boost to autocomplete entry near the insertion point)

If you have such a need you can use scoring with the folowing guideline:

Prepare the query
Iterate on each elements
- Compute canditate string from object
- Compute the match score
- Adjust score with external information as needed
- If score indicate a match include <candidate, score> on an intermediate list
Sort intermediate list by score
Build ouput list from intermediate list
- keep best items
- extract candidate from <candidate, score>

Basic scoring

Input: string to be scored, query
Ouput: Score(double), 0 if non match, positive otherwise.

score = fz.filter(candidate_string, query)

There's no variant that take an object and ask wich key because at this point you can probably do it better.

It is not recommanded to display result of scoring to user. Even if the ordering try to be intuitive, the score by itself is very hard to interpret since it mix together a lot of quality signals, is non linear and sometime jumpy.

Loop scoring (prepQuery)

The basic idea is to precompute some quantity upfront about the query so we do less work on a candidate by candidate basis.

prepared = fz.prepQuery(query)
for(...){
    score = fz.filter(candidate_string, query, prepared)
}

Note: there's the recent addition of a cache on the fz object that store last query and coresponding prepared query. So, in simple for loop with constant query, this should not be needed anymore.

Matching

To communicate why the algorithm think a result is good or bad, it's often good to highligth matched characters. The function match return an array of position where candidate_string match query. (If multiple are possible it return one of the position set that produce the best score)

fz.match(candidate_string, query)

Note fz.match(candidate_string, query, prepQuery) is also available.

See also the demo on how to wrap that ouput with html tag https://github.com/jeancroy/fuzzaldrin-plus/blob/master/demo/demo.html#L85-L137

Advanced

All of the method (filter, score, match, prepQuery) take an option hash. Some of those setting are common to all (for example tweak on how to score). Some setting are specific example keyin filter.

I may return to document those when I have more time, but most user don't need them.

Aug 25 '16 00:08 jeancroy

Thanks! This was very helpful. I'm working on a set of TypeScript typings for your library, does the following look correct to you?

// Type definitions for fuzzaldrin-plus
// Project: https://github.com/jeancroy/fuzzaldrin-plus/
// Definitions by: Jason Killian <https://github.com/jkillian>
// Definitions: https://github.com/DefinitelyTyped/DefinitelyTyped

export as namespace fuzzaldrin;

export interface IQueryOptions {
    pathSeparator?: string;
    optCharRegEx?: RegExp;
}

export interface IScoringOptions extends IQueryOptions {
    allowErrors?: boolean;
    isPath?: boolean;
    useExtensionBonus?: boolean;
}

export interface IFilterOptions extends IScoringOptions {
    key?: string;
    maxResults?: number;
}

export type PreparedQuery = { __internalAPIBrand: string; };

export function filter<T>(data: T[], query: string, options?: IFilterOptions): T[];
export function score(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number;
export function match(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number[];
export function prepQuery(query: string, options?: IQueryOptions): PreparedQuery;

Note that the export as namespace fuzzaldrin; line denotes that the library is published in a UMD format.

Aug 25 '16 15:08 jkillian

Curently prepQuery looks more like this https://github.com/jeancroy/fuzzaldrin-plus/blob/master/src/scorer.coffee#L66-L73

Others signature seems OK.

Aug 25 '16 18:08 jeancroy

My thinking was that those are private fields that are only meant for use by your library and not by an external user. The way I wrote things above basically only lets users pass a PreparedQuery to score and match but not access its internal data.

Does that seem like the right decision?

Aug 25 '16 18:08 jkillian

Yes, thank you that look good. I have some plan to add some options I guess when that settle out we'll see how to extend the option hash definition.

Aug 25 '16 19:08 jeancroy

Great! See PR here if you're interested

Aug 25 '16 19:08 jkillian

@JKillian, @jeancroy I have updated the TS typing for latest changes. check PR 11865. @jeancroy, Can we also update npm package with a new released version?

Oct 09 '16 09:10 mdahamiwal

Thanks for that, I'll try to keep more stability in the interface for the future.

The reason I've demoted prepared query from it's own argument is that the internal cache was giving just as good performance than explicitly setting a prepared query. So no caring about prepared query allow simpler usage.

Oct 09 '16 12:10 jeancroy

Yes, that is one thing that should be taken care with every new changes. I was thinking to get a Nuget package published for this lib to make it available for .net projects or other projects that don't depend on Node.js. Currently we are using a copy of this lib (converted to TS). With Typings and a Nuget package, we can take a package dependency instead of a converted source. @jeancroy, thoughts?

Oct 09 '16 14:10 mdahamiwal

I'm open to maintaining a nugget package. And/or outputting typescript as a distribution format on each build. ( I may actually be due to cut a real release soon )

I'm also not that invested in the current coffescript form. The package was written for Atom text editor and cofeescript was what they used. But now they are moving to es6 and have babel in their tool chain I believe, so there might be a natural compromise in between es6 and typescript that is closer to actual usages.

Oct 09 '16 14:10 jeancroy

Awesome, moving to ES6 will definitely bring more cohesion with other projects as most of them are evolving in that direction to get more out of box functionality and performance. So, here is how I think we can maintain a NuGet release:

Maintain a separate release branch:

Appveyor config to automate package publishing for releases.
Travis CI to ensure the latest release is compatible with DefinitelyTyped typings.

master branch works as dev branch for regular improvements/updates and is merged to release. What you think? I can contribute in that direction as I get time.

Oct 09 '16 16:10 mdahamiwal

Hi @jeancroy, are you ok with the approach? I already have some work in my local repository for this.

Oct 20 '16 07:10 mdahamiwal

Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.

Oct 20 '16 13:10 jeancroy

Hi Jean,

May I know your Email ID? I will add you as owner for nuget package.

Thanks

On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy < [email protected]> wrote:

Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeancroy/fuzzaldrin-plus/issues/24#issuecomment-255108146, or mute the thread https://github.com/notifications/unsubscribe-auth/ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V .

Oct 25 '16 18:10 mdahamiwal

hi I'm registered with nugget as jeancroy, with email [email protected]

Jean Christophe Roy

On Tue, Oct 25, 2016 at 2:44 PM, Manish Dahamiwal [email protected] wrote:

Hi Jean,

May I know your Email ID? I will add you as owner for nuget package.

Thanks

On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy < [email protected]> wrote:

Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/jeancroy/fuzzaldrin-plus/issues/24# issuecomment-255108146>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V> .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jeancroy/fuzzaldrin-plus/issues/24#issuecomment-256134736, or mute the thread https://github.com/notifications/unsubscribe-auth/AMLCEopRajjavpgKzi0i3_6_UDqFoB0Nks5q3k3xgaJpZM4JsR-V .

Oct 25 '16 18:10 jeancroy

fuzz-aldrin-plus fuzz-aldrin-plus copied to clipboard

Document API / Usage

Basic usage

Filtering

Filtering Array of strings

Filtering Array of objects

Scoring

Basic scoring

Loop scoring (prepQuery)

Matching

Advanced

fuzz-aldrin-plus
fuzz-aldrin-plus copied to clipboard