inputs
inputs copied to clipboard
Add "normalize" option to Search input?
This would allow to search texts with accents (more generally diacritics) by normalizing everything (input and values) with a common function (besides string.toLowerCase()).
Here's a poc https://observablehq.com/@fil/search-normalize; it takes a lot of code since I had to basically copy all of https://github.com/observablehq/inputs/blob/main/src/search.js, just to add a call to string.normalize here and there.
It's in use in https://observablehq.com/@visionscarto/aires-d-accueil-les-donnees
BTW, it might be not enough with normalize if we want to support all kind of language.
for example the current spec would just not be suitable for CJK languages, since there is not really space between characters.
The default filter splits the current query into space-separated tokens and checks that each token matches the beginning of at least one string in the data’s columns, case-insensitive.
At the moment crafting a customize filter is probably the only way to implement for various language.
Right, my expectation was that you would provide your own filter option to control this behavior in the general. But I’m happy to hear ideas on how we can either expose hooks in the existing searchFilter implementation, and a normalize hook as a preprocessing step that defaults to x => x.toLowerCase() sounds reasonable.
one quick idea without providing the whole filter option : Search can expose (or accept) a user-defined termFilter and an optional function to transform data before getting compared.
something like
Search(data, {
query: query,
termFilter: customTermFilter, // a user defined termFilter
transformFilter: customTransformFilter // function to transform source data
});
~~this may be solved by https://github.com/observablehq/inputs/pull/216~~
@Fil #216 has been merged. Can this issue be closed?
No. #216 supports non-ascii chars (so that searching for "île" in ["Ile", "île"] will find the second word). This issue is about supporting normalization (searching for "île" would find the two words—which is what I expect as a user).