Support for Soft Hyphens, a first step to better indexing for languages like German
In the following thread the problem is discussed that in German words are often a composition of simpler words. For example „Hochspannungsnetzgerät“ is a composition of „Hochspannung“ (High voltage) and „Netzgerät“ (power supply). If I search for „Netzgerät“ with pagefind it currently does not find „Hochspannungsnetzgerät“ though the term is included and semantically it most certainly is a kind of „Netzgerät“, so the user would expect to find it.
There seems to be no easy solution to that problem. But a first step would be to add optional support for soft hyphen characters. Pagefind should treat the soft hyphen as a word boundary. This would enable the generators of the static html to include this hints for pagefind in the page.
👋
This can be added, and no need for it to be optional. Pagefind already indexes multiple words for a given location when required — e.g. when it encounters a word source_text it will index source_text, source, and text at the given location. I can add the soft hyphen to this list which will roll it into the same handling.