10ten-ja-reader icon indicating copy to clipboard operation
10ten-ja-reader copied to clipboard

Try all variants when expanding ー

Open birtles opened this issue 5 years ago • 1 comments

Currently when we encounter something like オーサカ we will generate the alternatives オウサカ and オオサカ and try them in turn. If any of them generates a longer match than オーサカ we will use that and stop trying any further alternatives.

Now, as it turns out in the word dictionary we only have any entry for おおさか and not おうさか so we correctly match 大阪 (and 大坂).

However, in the names dictionary we have entries for both おうさか and おおさか but since we find a match for おうさか that is longer than オーサカ we never try おおさか and hence never match 大阪.

We should really try all of the alternatives and if any of them produce a longer match, then merge those with equal length.

birtles avatar Sep 08 '20 06:09 birtles

817db30 only fixed this for name dictionary entries. There are likely still cases where we should do this for word entries.

birtles avatar Jul 02 '21 01:07 birtles