groonga
groonga copied to clipboard
Add support for stop word with separated stop word table
What is your problem?
TokenFilterStopWord adds a column to a lexicon to indicate whether the term is a stop word or not. We can't use TokenFilterStopWord with PGroonga because users can't add a custom column to a lexicon in PGroonga.
If we add support for separated stop word table, PGroonga users can use TokenFilterStopWord. For example:
plugin_register token_filters/stop_word
table_create StopWords TABLE_PAT_KEY ShortText \
--normalizers NormalizerNFKC150
column_create StopWords is_stop_word COLUMN_SCALAR Bool
load --table
[
{"_key": "and", "is_stop_word": true}
]
table_create Lexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenMecab \
--normalizers NormalizerNFKC150 \
--token_filters `TokenFilterStopWord("column", "StopWords.is_stop_word")`
-
StopWords._key's type must equal toLexicon._key's type -
StopWords.is_stop_wordmust beBooltype - It may be better that we may not reuse
columnoption forTokenFilterStopWord
How to reproduce it
No response