sqlite-fts-python icon indicating copy to clipboard operation
sqlite-fts-python copied to clipboard

FTS5_TOKEN_COLOCATED

Open andersjo opened this issue 3 years ago • 1 comments

The FTS tokenizer API has the concept of "colocated" tokens where multiple tokens can occupy the same position in a sentence. The main use of this functionality is to implement synonyms (See Sec 7.1.1).

Is there any way to mark a token as colocated through the Python API?

andersjo avatar Jun 05 '21 17:06 andersjo

I believe the author thought of it, however, I haven't tested it.

xToken(pCtx, 0, "i",                      1,  0,  1);
xToken(pCtx, 0, "won",                    3,  2,  5);
xToken(pCtx, 0, "first",                  5,  6, 11);
xToken(pCtx, FTS5_TOKEN_COLOCATED, "1st", 3,  6, 11);
xToken(pCtx, 0, "place",                  5, 12, 17);

https://github.com/hideaki-t/sqlite-fts-python/blob/2808e9165d26e56e869fd633641fd29c2adce6f1/sqlitefts/fts5.py#L244

It should be possible, or even if it isn't yet, shouldn't be hard to implement. Will test it to see if it works and make PR if it doesn't.

EDIT: Remove docs link. sorry 😅, you already linked relevant section on SQLite docs website.

bernard-crnkovic avatar Nov 26 '22 12:11 bernard-crnkovic