inputs icon indicating copy to clipboard operation
inputs copied to clipboard

Add "normalize" option to Search input?

Open Fil opened this issue 4 years ago • 6 comments

This would allow to search texts with accents (more generally diacritics) by normalizing everything (input and values) with a common function (besides string.toLowerCase()).

Here's a poc https://observablehq.com/@fil/search-normalize; it takes a lot of code since I had to basically copy all of https://github.com/observablehq/inputs/blob/main/src/search.js, just to add a call to string.normalize here and there.

It's in use in https://observablehq.com/@visionscarto/aires-d-accueil-les-donnees

Fil avatar Apr 15 '21 12:04 Fil

BTW, it might be not enough with normalize if we want to support all kind of language.

for example the current spec would just not be suitable for CJK languages, since there is not really space between characters.

The default filter splits the current query into space-separated tokens and checks that each token matches the beginning of at least one string in the data’s columns, case-insensitive.

At the moment crafting a customize filter is probably the only way to implement for various language.

easz avatar Apr 18 '21 16:04 easz

Right, my expectation was that you would provide your own filter option to control this behavior in the general. But I’m happy to hear ideas on how we can either expose hooks in the existing searchFilter implementation, and a normalize hook as a preprocessing step that defaults to x => x.toLowerCase() sounds reasonable.

mbostock avatar Apr 18 '21 18:04 mbostock

one quick idea without providing the whole filter option : Search can expose (or accept) a user-defined termFilter and an optional function to transform data before getting compared.

something like

Search(data, {
    query: query,
    termFilter: customTermFilter, // a user defined termFilter
    transformFilter: customTransformFilter // function to transform source data
  });

easz avatar Apr 18 '21 19:04 easz

~~this may be solved by https://github.com/observablehq/inputs/pull/216~~

Fil avatar Mar 10 '22 08:03 Fil

@Fil #216 has been merged. Can this issue be closed?

mootari avatar Oct 30 '22 09:10 mootari

No. #216 supports non-ascii chars (so that searching for "île" in ["Ile", "île"] will find the second word). This issue is about supporting normalization (searching for "île" would find the two words—which is what I expect as a user).

Fil avatar Nov 10 '22 14:11 Fil