pagefind icon indicating copy to clipboard operation
pagefind copied to clipboard

Alias accented characters

Open velldes opened this issue 1 year ago • 3 comments

Please add possibility to treat different chars as same? For example 'Míša' should be searchable as 'Misa'. ě = e ů = u ř = r ґ = г ('Ґанок' should be searchable as 'Ганок') and so on

velldes avatar Jan 01 '24 14:01 velldes

👋 @velldes — thanks for the issue, this is definitely an overdue feature.

Due to the way the index is constructed, this will likely have to be a setting at indexing time — i.e. a CLI flag for the pagefind binary that changes the way words are indexed sitewide.

Since it's a new setting, it will need to default off to not be a breaking change — and I can imagine use-cases where the current behavior is preferred.

Currently my leading contender for opting into this would be something like pagefind --site my_site --merge-diacritics. Perhaps with a catchier flag name.

How does that sound to you?

bglw avatar Jan 06 '24 09:01 bglw

Sounds good. Thank you

velldes avatar Jan 06 '24 16:01 velldes

Any progress on this one?

ColeDCrawford avatar Apr 30 '24 20:04 ColeDCrawford