lindera
lindera copied to clipboard
"japanese_compound_word" doesn't work properly without built-in dict
It's possible to use lindera without any dictionary feature and load a external dictionary instead, however, in this case, "japanese_compound_word" will produce compound words with empty details, since the code to handle this is (needlessly) guarded behind those features: https://github.com/lindera/lindera/blob/main/lindera/src/token_filter/japanese_compound_word.rs#L96-L199
further down, "japanese_stop_tags" would assume any detail with length < 4 to be 1 https://github.com/lindera/lindera/blob/main/lindera/src/token_filter/japanese_stop_tags.rs#L102-L105 this would then trigger a index out of range panic.
@Jimmy-Z Thanks for the report. I will check it later.