Spellchecking should be able to return multiple suggestions
If the best result shares the same Levenstein (or similar) distance
Considering Xapian can not do that out-of-the-box, this issue depends on #1014
@veloman-yunkan I believe this is the most prioritary feature to implement now in the spellchecker. You told me you have a way to do it, but this is a bit hacky, but the following things are unclear so far:
- What is the nature of the problem
- How the solution would look like
I think it would be better to be clear about the this two points above before implementing a PR.
@kelson42
Current implementation of spelling correction (which, by the way, is fully contained in libkiwix) relies on xapian's spelling correction functionality. The latter only supports a single spelling correction per query. That limitation can be worked around by temporarily removing the returned correction from the spelling database and repeating the request, whereupon the next best correction will be returned. That procedure can be performed as many times as needed to obtain the desired number of corrections. The removed entries then must be re-inserted into the spellings DB. Such a hack has the following drawbacks:
- Spelling correction is not supported by the in-memory backend of Xapian, i.e. the glass (on-disk) backend has to be used, and the spelling correction operation thus becomes a non-readonly operation with respect to the on-disk data leading to: a. extra wear of SSD storage b. risk of data corruption (loss of spelling entries) if the application crashes during the spelling correction function call (this can be worked around with additional measures) c. spelling correction cannot be called concurrently in kiwix-serve d. slow
Besides, Xapian's algorithm for spelling correction is based on edit distance, rather than on phonetic similarity. If we intend to eventually provide real spellchecking instead of a surrogate one, we should use a real spellchecking engine. Switching to one will automatically enable multiple corrections.
@veloman-yunkan Thx for the explanation. My conclusion is that this issue is blocked by #1014 and we should implement #1014 first.