elasticlunr-rs
elasticlunr-rs copied to clipboard
Index created by elasticlunr-rs doesn't work with elasticlunr.js for characters that can't be represented by a single UTF-16 Code Unit
https://github.com/mattico/elasticlunr-rs/blob/29d97e4c8e91bb0d1813716fb2d1575066344d76/src/inverted_index.rs#L40-L42
During index building, elasticlunr-rs
iterates over the token &str
's content in Unicode Scalar Values.
While the JS library does it in this way:
elasticlunr.InvertedIndex.prototype.addToken = function (token, tokenInfo, root) {
var root = root || this.root,
idx = 0;
while (idx <= token.length - 1) {
var key = token[idx];
The JS string is actually iterated in UTF-16 Code Units, which are entire characters for English, most alphabetic text, common Chinese characters; but not Emojis and rare Chinese characters.
Related issue with mdBook.